TECHNOMETRICS 


A Journal of Statistics for the Physical, Chemical and Engineering Sciences 


Published Quarterly by 


THE AMERICAN SOCIETY FOR QUALITY CONTROL 
and the 
AMERICAN STATISTICAL ASSOCIATION 


Editor 


J. Stuart HunTeR 


Associate Editors 


Wittram ALLEN Davw R. Howes 


G. A. Barnarp Frep C. LEONE 


C, A. BENNETT Frank ProscHAN 


CutHsert DANIEL Leo Tick 


Besse B. Day Martin Wik 


R, J. Haver Marvin ZELEN 


Management Committee 
Paut S. Otmsteap, Chairman 


For the For the 
American Society for Quality Control American Statistical Association 


Warren R. Purcett Irnvinc Burr 


Maynarp RENNER CHURCHILL EISENHART 


H. L. Wenrry Donatp C. Kirey 


Published Quarterly in February, May, August and November by the 
Technometrics Management Committee of the American Society for Quality 
Control and the American Statistical Association. Editorial Office: Mathematics 
Research Center, U. S. Army; The University of Wisconsin, Madison, Wisconsin. 
Publication Office: Wm. Byrd Press, P. O. Box 2W, Richmond 5, Virginia. Second 
class mailing privileges granted at Richmond, Virginia. 


Composed and Printed at the 


Witiram Byrp Press, Inc., Ricomonp, Vircinia, U.S.A. 





CONTENTS 


TECHNOMETRICS, Vol. 3, No.-2, MAY, 1961 


General Considerations in the Analysis of Spectra 
G. M. Jenkins 


Mathematical Considerations in the Estimation of Spectra 
Emanuel Parzen 


Discussion, Emphasizing the Connection Between Analysis of 
Variance and Spectrum Analysis John W. Tukey 


Some Comments on Spectral Analysis of Time Series 


N. R. Goodman 


Comments on the Discussions of Messrs. Tukey and 


Goodman G. M. Jenkins and Emanuel Parzen 


Spectral Analysis Combining a Bartlett Window with an 
Associated Inner Window Thomas A. Wonnacott 


Frequency Response from Stationary Noise: Two Case 
Histories 


N. R. Goodman, S. Katz, B. H. Kramer and M. T. Kuo 
The Modified Gauss-Newton Method for the Fitting of 


Non-Linear Regression Functions by Least Squares 


H. O. Hartley 


On the Possibility of Improving the Mean Useful Life of 
Items by Eliminating Those with Short Lives 
G. S. Watson and W. R. Wells 


Book Review C. Daniel 
Statistical Programs for High Speed Computers 


Notices 




























Vor. 3, No. 2 TECHNOMETRICS May, 1961 


General Considerations in the Analysis of Spectra 


G. M. JENKins* 





TABLE OF CONTENTS 
Part 1: Physical Aspects 

. Introduction 

. Statement of the Problem 

. Definition and Physical Meaning of the Spectrum 
. Frequency Response 

. Filters 

. Aims and Means in Time-Series Analysis 


Part 2: Statistical Aspects 


7. Continuous Records and Aliasing 
8. Kernels and Windows 
9. The Periodogram 
10. General Spectral Estimates 
11. Heuristic Treatment of the Sampling Properties of Spectral Estimates 
12. The Blackman-Tukey Approach 
13. Design and Analysis Criteria 
14. The Logarithmic Transformation of the Spectral Density 
15. Summary 
References 


oor WN 


Part 1: PuysicaL ASPECTS 









1. INTRODUCTION 


The object of this paper is to present a simplified account of the motivation 
behind the spectral analysis of time-series and to give a heuristic discussion of 
the statistical problems which arise. It is directed mainly at statisticians with 
little experience of the theory and applications of spectral analysis. It is of 
necessity short and omits a great deal of important detail. A much lengthier 
exposition for non-statisticians is in the course of preparation. 








2. STATEMENT OF THE PROBLEM 


The starting point in any spectral analysis is a function of time X(t) defined 
in an interval 0 < t < T relative to an arbitrary origin. In general, X(¢) will 
exhibit apparently widely fluctuating properties to the eye and is called a time- 
series or a random noise. In more general situations there will be m such functions 
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X(t), (i = 1,2, --+ , m) each of which is observed over several (n;) time intervals 
T;:;,j = 1, -+- ,; . For simplicity we confine ourselves almost entirely to a 
single time-function defined in a single interval. In many applications it is not 
clear as to what extent time ¢ (or some other physical quantity) is the important 
independent variable in X(t). Thus, if X(¢) represents the random displace- 
ments of an aircraft wing when flown through a region of turbulence, the domi- 
nant characteristic measured by ¢ is the path in space traveled by the aircraft. 
However, it is not possible to say that we are recording fluctuations at a fixed 
time along the path nor at a fixed point for a length of time, but rather a mixture 
of these two ideals. 

Whilst such a trace (a graph of X(t) against ¢) will form the starting point 
of the statistical analysis, important considerations will have gone into the 
question of how to record the data; these are not discussed here. The most 
important point to remember at this stage is that X(t) will in most cases have 
been measured by an instrument (a transducer), then passed in the form of an 
electrical signal to another instrument and finally recorded. These instruments 
will immediately impose restrictions on the amount of information which can 
be obtained from the record in the sense that there will be higher frequencies 
to which the instruments are completely insensitive. This means that the 
apparent continuity in the trace is illusory; all physical time-series are essentially 
discrete or band limited (i.e., frequency limited). 

Roughly speaking, spectral analysis is concerned with an examination of 
X(t) from the point of view of its frequency content. More generally, cross 
spectral analysis between X;(t) and X;(¢) is concerned with the interactions 
or correlations between such pairs of variables at each frequency. 

It is clear that the first question which needs to be answered is why in certain 
situations (and not in others) this is a convenient method of analysing the 
data. This is fairly obvious to the physicist and engineer but may be puzzling 
to most statisticians who are normally accustomed to working with correlations 
in the time domain. The writer is convinced that in order to understand the 
motivation behind spectral analysis, the statistician should become familiar 
with the notion of the frequency response of mechanical and electrical systems 
and be reasonably acquainted with Fourier analysis techniques which are basic 
in the physical sciences; by way of introduction, frequency response techniques 
are discussed very briefly in part 1 of this account. With this background, it 
is relatively easier to understand the basic elements involved in the statistical 
estimation of spectra which are discussed in part 2. 

In contrast to the statistician, most physicists and engineers who are familiar 
with frequency response and Fourier methods find the motivation clear but 
their familiarity with these techniques presents a stumbling block in under- 
standing why Fourier methods are inappropriate without considerable modifi- 
cation when applied to the analysis of noise data. It is necessary to stress these 
two different aspects when introducing the subject to the two groups. 


3. DEFINITION AND PHysIcAL MEANING OF THE SPECTRUM 


In classical problems in statistics, we are faced with observations X, which 
are independent because the experiments which generate these measurements 
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are themselves physically independent. In such situations, it is usually assumed 
that the errors have a common but generally unknown distribution, the varia- 
bility of which is often characterised by its variance 


= BX — 4p) =f (X— wp (X) ax 3.1) 


where E(X) = uy is the mean value of X, which is assumed to be a constant, 
and p(X) is the probability density function of the errors. If the errors are 
Gaussian or normal, then » and o° characterise the distribution completely. 

When X becomes X(t), a function of time or space, another factor enters 
because consecutive or contiguous values of X(t) are correlated so that in 
addition to u and o”, we need to specify the simplest mutlivariate movements, 
namely the covariances between the values of X(t) at different times. If interest 
is focussed on n time points ¢, , 2 , --- , t, , this will involve the calculation 


of (") covariances in the most general case, viz., 


vii = E{(X; — w)(X; — w)} = EU(X) — w)(X4) — )}. (3.2) 


If any headway is to be made in making a =‘ atistical analysis of X(t), in particular 
that a single observed series is to give meaningful information about the ensemble 
averages (3.1) and (3.2), it is necessary as a minimum requirement to assume 
stationarity i.e., that all statistical properties depend on differences t; — t; 
rather than on the ¢; and t; themselves. This effectively reduces the number of 
covariances to be estimated to (n — 1) these being the so called auto-covariances 


Ye = EX(X; — w)(Xise — wh. (3.3) 


If it is legitimate to make the assumption that in addition to stationarity, the 
statistical properties of X(t) are governed by a Gaussian (or normal) process, 
i.e., the joint distribution of X(f) values at any set of times t, , t2 , --- , t, is 
a multivariate normal distribution, then u, o” and y, are sufficient to characterise 
the behavior of X(t) completely. Some remarks about non-gaussian and non- 
stationary time-series will be made at the end of this section and in section 6. 

if the assumptions of near (approximate) stationarity and near normality 
are justified, then it would seem that all that is required, since u, o” and y, 
provide complete descriptions of X(t), is to estimate these quantities from the 
sample as efficiently as possible. Quite apart from the question of the statistical 
interpretation of the estimated quantities, there are strong physical grounds 
for looking at other functions rather than the auto-covariances. 

To illustrate this, suppose that X(t) were a mathematical function of time 
of the form (3.4): 


X(t) = A+B cos (wi + ©). (3.4) 


Hence X(t) could be regarded as a description of an alternating voltage of 
frequency w, amplitude B, mean value A and phase ® relative to an arbitrary 
time origin. Then, if X(t) is measured in volts, the instantaneous power gener- 
ated in a given resistance at a given time ¢ is proportional to X°(t), measured 
in watts. Further, the average power per complete cycle (obtained by inte- 
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grating X°(t)) is given by 
Ave (X°(t)) = A? + 4B’. (3.5) 


We may thus regard the average power per cycle as being made up of two 
components; a time invariant d.c. (direct current) component A’ and a time 
dependent a.c. (alternating current) component 3B. The ac. component, may 
be rewritten in the form 


Ave [X(#) — A]* = 3B’ (3.6) 


and since the averages in (3.5) and (3.6) are over time, (3.6) is simply the sample 
variance of the function X(é) if taken over a complete number of cycles. If 
X(t) is imagined as being composed of an infinite number of alternating cur- 
rents, superimposed on one another, then the sample variance (the total power) 
may be decomposed into components (of power) corresponding to each fre- 
quency as demonstrated in (3.7): 


Ave [X(t) — A]? = } > B (3.7) 


It is also possible to regard (3.7) as being true as an ensemble average if in 
successive samples of X(t), the phases ®, of the alternating currents are chosen 
randomly in the interval (0, 27). If X(t) is a more general statistical fluctuation, 
ie., the behavior of X(t) at different times is governed by given probability 


laws, this analogy breaks down formally. However, (3.7) may be replaced by 
a similar formula, viz., 


= c F() do +4 DB. (3.8) 


It may be seen that in addition to the components of power or variance due 
to the genuine periodic terms, there is a further term representing contributions 
to the total variance from a continuous spectrum of frequencies. If a decomposition 
of the form (3.8) is possible, i.e., there are lines in the spectrum corresponding 
to genuine periodic terms together with a continuous distribution in frequency 
then the spectrum is said to be mized. 


In situations where there are no lines, then (3.8) may be written as 


& [ ” F(e)/o? do (3.9) 


and f(w) = F(w)/o’ is called the spectral density function. 

Formally, therefore, the spectral density function is analogous to the prob- 
ability density function and the line spectrum (if suitably normalized) corre- 
sponds to a probability mass function or discrete probability distribution. The 
analogy is more than formal since the problems of statistical estimation of 
the probability density and spectral density functions are closely related. If 
both lines and a continuous spectra are present, we may calculate the inte- 
grated spectrum H(w) which represents the total power or variance in the 
frequency interval (0, w) so that o? = H(). Again by normalising, we may 
define h(w) = H(w)/c’ so that h(o) = 
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It may be shown that a stationary gaussian series is completely characterized 
by its integrated spectrum H(w) and by its spectral density function f(w) if 
no lines are present. It follows, therefore, that there must be a relation between 
f(w) and the autocovariances y(k). In fact, they turn out to be Fourier transforms 
of one another i.e., 





1p = s [ ; 1 cos wk dk (3.10) 


where p(k) = y(k)/o’ are the autocorrelations, satisfying the property p(k) = 
p(—k) and p(o) = 1 since y(0) = o”. 

A spectrum of special importance in the theory occurs when the time series 
is defined at equidistant intervals At, and the autocovariances are all zero. 
These conditions would be satisfied, for example, by a mechanism for gen- 
erating random numbers. In this case, the spectral density is given by 

fe) =, O<w<x/ad) (3.11) 


Tr 








where x/At is the maximum frequency which can be measured and depends on 
At, the length of the time interval. Thus, a constant spectral density corre- 
sponds to zero autocovariances or independence of the errors and is called 
white noise by physicists and engineers because of its resemblance to the optical 
spectrum of white light which consists of very narrow lines close together. 
Close approximations to white noises with normal distributions are very often 
generated in physical systems due to central limit effects. As examples we 
may cite thermal noise and shot noise which are described in some detail by 
Laning and Battin [1]. 

It is necessary to bear in mind that there will be certain time series (non- 
normal series) for which the spectrum, or equivalently the autocorrelation 
function (a.c.f.), conveys only a limited amount of information about the 
statistical behaviour of the series; two time-series may possess the same spectra 
but yet have widely different statistical properties. Thus, a gaussian series 
with a.c.f. 


p(k) = e*"*! (3.12) 


and spectral density 


1 1 
f@) = ot ha (3.13) 


which may be generated by subjecting gaussian white noise to a fairly simple 
averaging or filtering procedure (see section 5), will have the same spectral 
density as the so-called random telegraph signal. This is a time series which takes 
values +1 alternately, the times at which a change of state takes place being 
points of a Poisson process. Its spectrum may be made to coincide with (3.13) 
by arranging so that the Poisson parameter \ is given by 2A = a. However, 
the statistical properties of these two series will be entirely different. In any 
specific situation, it is worth considering in advance how potentially useful is 
the information contained in the spectrum. If the distribution of X(¢) is skew, 
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it will be worth while sometimes to make a transformation of X(t) before con- 
ducting an analysis. 


4, FREQUENCY RESPONSE 


It may still seem unnatural to statisticians that one should want to look 
at the distribution of variance or power with frequency rather than the auto- 
correlations. In this section, one reason is given as to why this is a convenient 
method of summarising the data—this reason is dictated entirely by the use 
to be made of the end product of the analysis. For illustration, we give two 
examples: 

(a) Suppose that X(t) represented random fluctuations superimposed on the 
echo received by a radar antenna. The change of direction of the axis of the 
antenna is to be determined by its response to the received signal plus the 
noise which is generated as a result of imperfect reflection from the target and 
irregular propagation in the intervening medium. 

(b) Suppose that X(#) were the random velocities of wind gusts impinging 
on an aircraft wing. 

What is of direct interest to the engineer is how the system (the antenna or 
the aircraft) responds to X(t) and what are the statistical properties of the 
output Y(t) which result from the interaction of X(t) with the system. In (a), 
Y(t) would be the internal voltage generated in the antenna which would be 
used to determine the angle through which the axis rotates. In (b), Y(t) would 
be some property of the aircraft which it was desired to measure e.g., the fluc- 
tuating stresses at various points in the structure or the fluctuating accelerations 
at the center of gravity. 

It turns out that for certain systems, called linear systems, a simple and 
direct answer to this question may be given if the spectrum of X(t) is known 
reasonably accurately. Many physical systems can be treated as if they were 
approximately linear. 

Suppose that X(t) were a cosinusoidal disturbance 


X(t) = A coset, (4.1) 


then for linear systems, the output Y(#) may be characterised by its response 
to this excitation in the form 


Y() = G@)A cos (wt + $@)) (4.2) 


where G(w) is a gain or attennuation factor and ®(w) is a phase shift. It is to be 
observed that the gain and phase shift are functions of frequency; the reason 
for this is that the response at a given time ¢ depends (linearly) on what happened 
at all previous time instants, i.e., 


Y(t) = [ ” W(2)X(t — 2) dr. (4.3) 


It may be shown further that the spectra of X(¢) and Y(t) are related by the 
very simple formula 


ovfy) = oxfx@G*@) (4.4) 
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and in view of the linearity of (4.3), if X(¢) is normal and stationary, then so 
is Y(t) provided the function W(r) satisfies certain (stability) conditions. Rather 
than work with a gain and phase separately, it is sometimes more convenient 
to work with 


Yu) = Gwe'*™ = [ Ws) de 


where y¥/(w) is called the frequency response function*; it is clear therefore that 
G(w) = | ¥(w) | and ®(w) = arg. ¥(w). 

The actual calculation of G(w) and ®(w) for any specific system can some- 
times be a very difficult mathematical problem, involving the solution of the 
differential equations governing the oscillatory properties of the system. Alter- 
natively G(w) and @(w) may be determined empirically sometimes by calibrating 
the system using sine waves of different frequencies. 

Once G(w) is known, then knowledge of the spectrum of X(t) immediately 
conveys information about the spectrum of Y(#) via (4.4); this information 
is of immense value to the design engineer. 

(1) In example (b) above, if fy(w) were to give rise to sharp peaks (or reso- 
nances), then it would be necessary to alter ¥(w) i.e., redesign the aircraft. 

(2) In example (a) above, it might be possible to take steps to eliminate 
(i.e., filter) the oscillations due to X(t) before they enter the system or if this 
were not possible, to alter the frequency response function of the system by 
suitable redesign, as in (1). 

(3) Referring to (4.4), it may be seen that G(w) may be calculated if fx(w) 
and fy(w) have been estimated. In other words, spectrum analysis affords a 
means of calibrating physical systems. It is to be observed, however, that if 
the estimation of G(w) is the main objective, then a more efficient method which 
also enables the phase &(w) to be calculated as well, is available in the form of 
cross spectral analysis. This topic will be discussed at greater length in a future 
publication. For a discussion of this problem, the reader is referred to Good- 
man [23]. 


5. FItTers 


Closely related with the notion of frequency response is the apparatus which 
is used to investigate frequency response in a variety of situations viz., a filter. 
An electrical filter, for example, is an electrical circuit consisting of inductances 
and capacitances arranged so that from a given input to the circuit, the filter 
will accept a range of frequencies with much greater sensitivity than others. 
A linear filter produces from an input X(é), an output Y(t) according to (4.3) 
which is a weighted average of all the inputs over its past history. A filter, 
therefore, is a device with a tailored G(w) in (4.4), which may be used to produce 
an output in a required frequency range. 

A digital filter is simply a symmetrical moving average in which care has 
been taken so that its properties interpreted in the frequency domain satisfy 
certain requirements. Thus a discrete time series X, , s = 0, +1, +2, --- may 


* When regarded as a function of z = e*”, this is called the transfer function. 
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be subjected to a filtering operation by calculating from it an output Y, by 
means of the relation 


+k 
Y,= 2 Aas (5.1) 
where the A, are suitably chosen weights satisfying \, = A-,; « 

It is necessary to observe one important difference between (5.1) and (4.4). 
Whereas physical filters operate on the past history of X(é) only, there is no 
such limitation as far as (5.1) is concerned which is a weighted average over 
both past and future values. As with equations (4.1) and (4.2), a periodic input 
(which we may represent conveniently in complex form) X, = Ae‘*’ is con- 
verted into an output Y, , where 


Y, = Ae**’T(@) (5.2) 


and 


Tw) = . i. (5.3) 


T(w) is sometimes called the filter factor but more commonly the frequency 
response function of the filter. It is customary to talk about low pass, high 
pass, or band pass filters according as 7'(w) is arranged so as to accept low 
frequencies, high frequencies, or a band of intermediate frequencies. 

If the input to the filter contains a continuous spectrum of frequencies, 
then the spectral density of the output will be exactly as in equation (4.4) 
with G’(w) replaced by [| T(w) |?] owing to the fact that 7'(w) was defined as a 
complex quantity. 

On integration of this result over all frequencies, we obtain 


tee [ fro) | TC) |® deo (5.4) 


which shows that the variance of the output is a weighted average over the 
spectrum of the input. 

As far as spectral analysis is concerned, the importance of digital filtering 
is two-fold. 

(1) Spectral analysis itself may be interpreted as performing operations 
such as (5.1) on the original series and then finding the variance of the various 
output series; alternatively it may be regarded as performing an operation 
such as (5.4) with fx(w) replaced by an estimate, (the periodogram) which will 
be defined in a later section. 

(2) It is sometimes very useful to precede a spectral analysis by a filtering 
operation which removes low frequency trends or else separates the fluctuations 
into several bands for further investigation. This will be discussed briefly in 
the next section. 


6. Aims AND MEANS IN TIME-SERIES ANALYSIS 


The previous sections, although not directly relevant to the statistical prob- 
lem of spectral estimation as such, have been written with the intention of 
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giving some motivation for spectral analysis to statisticians who might be 
puzzled by the fact that one would want to work in the frequency domain 
at all. In fact, the relative merits of the time-domain (autocorrelations and in 
the general case, cross correlations between time-series) and the frequency 
domain (spectra and cross spectra) have been the cause of some differences 
of opinion amongst statisticians in the past few years. It is being recognised 
now that spectral techniques are rapidly becoming one of the most important 
and useful statistical tools in the physical sciences. The autocorrelations, how- 
ever, have played a very important role in the existing theory of time-series 
e.g., they enter directly into the estimation equations for mixed moving average— 
autoregressive models of the form 


Bi + aX 4-4 + > + a, X t-% = BZ, + BiZi-1 + — + BnZt—m (6.1) 


where the {Z,} sequence is a sequence of uncorrelated random variables. Note 
that since X, can be expressed as a linear function of the Z, in this case, a mixed 
moving average—autoregression may be regarded as the output from a linear 
filter whose input Z, is white noise and whose transfer function is a rational 
function of z = e**. In fact, the example given by (3.12) and (3.13) arises from 
a special case of the continuous time analogue of (6.1) in which the stochastic 
difference equation is replaced by a stochastic differential equation 


X(t) + aX(t) = Z(t). 


Many statisticians who use these models in their work, e.g., econometricians, 
have criticized the spectrum on the grounds that anything which can be done 
with the spectrum in the frequency domain can be done equally well with the 
autocorrelations in the time domain. From (3.10) it follows that knowledge 
of the population (or process) autocorrelations p, is equivalent to knowledge 
of the population spectrum f(w). On the other hand, there are many who argue 
that the spectrum is the natural quantity to estimate almost always. 

It is our opinion that these represent two extreme points of view. As in all 
statistical applications, time-series problems will be motivated by certain ob- 
jectives of an essentially non-statistical nature; these will, in most cases, dictate 
whether a particular approach is desirable or not. A paper dealing with aims 
and means in time-series analysis is in the course of preparation [3]. In this 
section, we summarise a few of these considerations without entering into great 
detail. 

In a specific application, it would seem that there are at least three con- 
siderations which should enter into the choice between autocorrelations and 
spectra 

(i) The use to be made of the estimated quantities. 

(ii) Ease of physical interpretation. 

(iii) Simplicity of sampling properties. 

We shall dispose of (iii) first by remarking that in general the sampling 
properties of spectra are considerably simpler than those of the autocorrela- 
tions. However there are certain situations when this is not sd. If, for example, 
tests of randomness were being made, using autocorrelations, on sets of random 
numbers, the sampling properties of the autocorrelations in this case are perfectly 
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straightforward. One would rarely use spectra for tests of randomness unless, 
possibly, one were interested in alternatives to randomness of a cyclical nature 
and that these cycles were of interest in themselves. There are other situations, 
for example, in the fitting of autoregressive schemes where, although the sampling 
properties of the autocorrelations are difficult, by working with the natural 
quantities to use in these situations, namely the partial autocorrelations, the 
sampling properties become quite simple. An example of the use of the partial 
autocorrelation function in fitting an autoregressive scheme, suggested by a 
priori physical considerations, to consecutive yields from a batch production 
process has recently been given by Jenkins and Chanmugam [4]. 

We proceed to illustrate (i) and (ii) above on some problems which are en- 
countered frequently. 


(a) Frequency response studies 


If the ultimate objectives of the analysis are dictated by considerations 
such as those discussed in section 4, then the spectrum certainly provides the 
answer to (i). Autocorrelations could be used for frequency response studies 
but the calculations would be considerably more difficult. Also, as indicated 
in section 4, the spectrum has a direct physical interpretation in this case. 


(b) Prediction and Stimulation 


Since prediction is done in time, it is natural to work in the time domain. 
For the simpler linear prediction models, since the parameters are estimated 
by functions of the auto- and cross correlations, it is natural also to work with 
these quantities. 

By simulation is meant the fitting of empirical time-series models (usually 
but not always of a mixed moving average-autoregressive type) to observed 
series so that from these models hypothetical time-series may be generated whose 
properties resemble the observed series. The simulated series are often used 
in a variety of applications e.g., in investigating the effect of various noises on 
radar devices and missile control systems. 

It is necessary to emphasise that if sufficient data were available in these 
problems, no preliminary model fitting would be necessary since the data 
could be fed directly into the simulation. However, the quantity of data re- 
quired for such simulation studies is so large that the cost of acquiring the 
necessary amounts would be prohibitive. 

As far as (i) is concerned, it is variate values which are required for feeding 
into the simulation model; spectra are of no use in this type of problem. Further, 
the systems involved are generally so highly non-linear that equation (4.4) 
is inappropriate. 


(c) Exploratory Investigations 


There is no substitute to the construction of meaningful time series models 
based on a priori considerations from which meaningful physical parameters 
may be estimated. However, in the construction of such models, empirical 
analysis, especially spectral analysis, may be of considerable importance in 
suggesting possible models. For example Frenkiel and Schwartzchild [5] were 
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able to show from a large number of spectral analyses that there were two 
different sources for the fluctuations in atmospheric turbulence; a low frequency 
component due to solar radiation and a high frequency component due to 
thermal instability of the atmosphere. This observation has been used as a 
starting point for a more detailed physical theory. 


(d) Non-Stationarity 


The spectrum loses much of its statistical significance in the case where 
the series is non-stationary i.e., its statistical properties such as the mean, 
variance and autocorrelation function, are changing in time. In the author's 
experience, there seem to be three types of series which arise in practice. 

(i) Those which exhibit stationary properties over fairly long time-periods 
e.g., deviations from a zero level such as tracking errors, outputs from 
noise generators etc. 

(ii) Those which are reasonably stationary if examined over a sufficiently 
short time or space period e.g., turbulence records, radar noises. 

(iii) Those for which the stationarity assumptions are manifestly untrue. 


In case (iii), which occurs very often, steps may have to be taken to remove 
the non-stationarity component before conducting a spectral analysis. In 
certain situations, there may be some justification for removing trends in the 
mean using some system of orthogonal polynomials. A general technique which 
makes no such assumption and which is extremely useful, is to use a digital 
filter as mentioned in section 5 to filter off low frequency trends and then to 
conduct a spectral analysis on the residual series. 

Even when the series looks stationary, there may be strong physical grounds 
for supposing that the observed series consists of a mixture of possibly stationary 
series. In fact, a single stationary series is an illusion suffered by mathematical 
statisticians! There may in fact be strong evidence that the oscillations in 
differing, but approximately known, frequency bands have quite different 
physical origins. In such a situation, filtering of the original series into several 
series using a bank of band pass filters may be an indispensable preliminary to 
spectral analysis. 


Part 2: STATISTICAL ASPECTS 


7. Continuous REcoRDS AND ALIASING 

In many physical applications, the data consists of a continuous trace X(t) 
which may have been recorded by a variety of means e.g., photographically. 
One of the first decisions which will have to be made is whether to reconvert 
this into an electrical signal and do all subsequent calculations with an analogue 
computer. Alternatively we may decide to read off values of X(t) at equidistant 
intervals At and conduct the analysis digitally on the discrete values obtained. 
The arguments for and against these alternatives have been discussed at length 
by Blackman and Tukey [6] and Tukey [7]. The consensus of opinion amongst 
users of these techniques is that whilst the analogue methods are quick, they 
require a considerable amount of maintenance and lag considerably behind 
digital spectral methods in precision and sensitivity. - 
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Fiaure 1—Aliasing. 


Suppose that as a result of reading X(t) in the interval 0 < ¢ < T at intervals 
At (where n At = T'), we end up with a sequence of n discrete observations 
X,,X.,°:: ,X,. It is clear that reading at this interval has meant a certain 
loss of information. In fact, all direct information is lost for frequencies above 
what is called the Nyquist frequency, namely wy = m/At radians per second. 
Since 27 radians = 1 cycle, we may also write the Nyquist frequency as fy = 
1/2 At cycles per second (c.p.s.). Heuristically, since we need at least two points 
to fit one frequency, the fastest sine wave through the data points will be as 
shown in figure 1; this is in fact +/At rad. per. sec. Superimposed on the Nyquist 
frequency we have drawn a sine wave with twice its frequency; as far as the 
data points are concerned, however, these two are indistinguishable. Since no 
information about X(é) is available between the data points, there is no means 
of directly estimating the amplitude of frequencies higher than the Nyquist 
frequency. Hence since 2wy is indistinguishable from wy as far as the data is 
concerned, what we are actually able to measure at a frequency wy is not f(wy) 
but the latter confounded with all frequencies which are indistinguishable 
from wy (as far as the data is concerned). In general, if f*(w) is the spectral 
density corresponding to X(¢), then the spectral density f(w) of the sampled 


trace is given by 
fe) = 3 {e(2tt + «) + (2H - «)}. 7.1) 
In other words, the sampled spectrum is obtained by folding the unsampled 
spectrum about even multiples 27k/At of the Nyquist frequency 2/At and 
adding these contributions in the range (0, wy). This has been called aliasing 
by Professor Tukey. It is clear that if we are to be able to measure f*(w) in the 
range (0, wy) from f(w), then we must arrange so that f*(w) = 0 (approximately) 
for w > wy. 

In any practical problem, three frequencies will be worth considering. 

(1) Q, , the frequency at which the frequency response function of the re- 
cording instrument falls to about 1%-2% of its maximum value and beyond 
which tt trails off even further. 

(2) A guess as to the frequency Q, for which the total power in X(t) beyond 
Q, is of the order of 1%-2%. 

(3) A clear indication in one’s mind as to the largest frequency Q, of interest 
in the study at hand, bearing in mind that the amount of reading effort will 
increase linearly with Q, . 
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Ideally, one would like to choose At from 2; = 2/At so that no frequencies 
beyond those of interest are read. However, if 2, > Q, and if f*(w) is large 
for w > Q,; , then there will be aliasing difficulties. If 2, is known with some 
exactitude, then aliasing could be avoided by basing At on Q, if 2, > Q;, . How- 
ever, if 2, is much larger than Q, , this will mean sampling at much closer in- 
tervals than is really necessary. In this case it would be necessary to filter 
the time-series (if possible before recording X(t) in the first place) and thus 
ensure that the recorded trace has very little power above Q, . 

A device which is sometimes useful if 2, is known fairly accurately is to 
choose At so that $(Q, + 2;) = 2/At. This avoids the use of prefiltering and 
also avoids aliasing since if 2, > Q, , all frequencies between 3(Q2, + Q;), and 
Q, will appear in the aliased spectrum between Q; and 3(Q, + Q;), leaving the 
frequencies of interest between 0 and Q, untouched. 


8. KERNELS AND WINDOWS 


We now consider the problem of estimation of the spectral density of the 
sampled trace. It will be convenient to regard the sampling interval At as the 
unit interval (say 1 second) so that the Nyquist frequency wy = 7 radians 
per second or fy = 3} cycle per second. By suitable rescaling at the end of the 
calculations, the final answers may be adjusted so as to take account of the 
actual value of At. 

Most writers on spectral analysis have regarded the problem as one of esti- 
mating f(w) which is given by the discrete time analogue of (3.10) and which 
relates the spectrum to the a.c.f.: 


f(s) =1h1 +2 an cos ak | tute (8.1) 


Also discussed in the literature is the problem of estimating the spectral mass 
over a bandwidth of extent 2h, viz., 


1 ath 1 o ° 
Bw) = = _, way = 1 +2 z Pr COS «, sin ih, (8.2) 


Parzen ((8], [9]) has considered the estimation of more general quantities J(A), 
which he has called spectral averages, where 


A) = [14 dy (8.3) 


and where A(w) is a kernel on window which will be defined shortly. 

Rather than estimate f(w), the spectral density, we might want to estimate 
o f(w) the power spectrum, in which case the autocorrelations p, in formulae 
(8.1)—(8.3) would be replaced by the autocovariances y, . In many applications 
in the physical sciences, it is the power spectrum which is the quantity of 
direct interest. There seems to be some difference of opinion at present as to 
which of (8.1)-(8.3) we should be estimating in any specific problem irrespec- 
tive of whether they are based on the +, or p, ; this will be discussed in greater 
detail in a later section. The considerations which are ‘developed in the next 
few sections will apply whichever of these quantities is being estimated. 
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Fiagure 2—Spectral Windows for Fixed Number of Lags (m = 6). 


In a sample of n observations, it is possible to calculate at most N = n — 1 
autocorrelations. Suppose that we were faced with the hypothetical situation 
where these were known exactly. Then f(w) may be approximated by the first 
N terms of (8.1); denote this by f,(w). It may be shown that this is given by 


fe) = [” 1K, » dy (8.4) 


where K(w — y) = {u(w — y) + u(w + y)}/2 and 


iy) = 20 by. (8.6) 


This means that for finite samples, even if we knew the p, exactly, all we could 
hope to measure is a smudged average of f(w). It is usual to call K(w, y) a kernel; 
the word window has been applied to this by Blackman and Tukey [6]. The 
various kernels which have been used up to the present in spectral analysis 
have been tabulated in table 1 and some forms for u(y) plotted in figure 2. 
As far as (8.4) and (8.5) are concerned, this kernel has the property that it is 
a maximum at y = », falls off rapidly on either side, and most of its total area 
is contained within a band of +2/N about w. Hence, the greater part of the 
smudging of f(w) is in this region. In formula (8.3) we have introduced a kernel 
A(w); the above considerations indicate that the base width of this kernel 
should be at least 2x/N where N = n — 1 but it will be shown in section 9 that 
its width must be considerably wider than this if reasonable spectral estimates 
are to be obtained. It is to be emphasised at this stage that the smudging refers 
to the spectrum in the recorded trace; it will be necessafy to bear in mind what 
relation this has to the spectrum of the original time-series. From now on we 
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shall regard the problem as being one of doing the best we can with the recorded 
spectrum. 

The convergence of (8.1) will depend on two things: 

(a) The nature of f(w) 

(b) the form of the kernel used in the approximating formula (8.4). 

If f(w) is a fairly smooth (irregular) function, then convergence will be fast 
(slow). However, it is possible to speed up the convergence of this series by 
multiplying the autocorrelations p, by weights \, ; this has the effect of altering 
the shape of the kernel K(w, y). Thus if \, = 1 — k/N, then 


_ sin’ (N/2)y 
Ha(y) ai aN sin” ly 


which is plotted in figure 2 where it may be seen that it will reduce the smudging 
of f(w) by focussing the kernel K(w, y) much more closely about w; this property 
will be called the focussing power of the kernel. As far as spectral analysis is 
concerned, the implications of this are that improved accuracy may be ob- 
tained even if all the autocorrelations in the sample were accurately known, 
by adjusting the focussing power of the kernel K(w, y) i.e., by introducing the 
weights A, . Alternatively, we could obtain equal accuracy using a far smaller 
number of autocorrelations than N (say m < N) by suitably weighting the 
autocorrelations. 

The work focussing power, although descriptive, is clearly a vague concept; 
in some way it depends on the nature of the derivatives of K(w, y) at the peak 
frequency y = w. The measure which is normally used to characterise the shape 
of a kernel is its bandwidth. Since (8.4) may be regarded as a filtering operation 
on the true spectrum f(w), we may speak of the bandwidth of this filter. One 
definition which is sometimes used is half the distance between the frequencies 
for which u(y) falls to a half of its maximum value on either side of the maximum. 
The definition used by Parzen [9] is that bandwidth is half the base width of 
a rectangular window which has the same height and same area as K(w, y). 
Yet another definition which has been used by Blackman and Tukey [6] (they 
call bandwidth, equivalent width) will be discussed in section 11. 


9. THE PERIODOGRAM 


In section 3, a plausibility argument (which may appeal to some and not 
to others) was given for the decomposition of the total variance or power into 
contributions corresponding to various frequencies. We now consider this in 
a little more detail. 

The starting point in the theory of spectral analysis is the much maligned 
Schuster periodogram which we define in the form 


n 2 n 2 n 2 
I,(#) = z. > Xe" | = 1 (5 Zi cos wt) + (= Z. sinat) } (9.1) 
mn | t=1 mn \\teai t=1 
where w = w; = 2xj/n. 
If this function is plotted at the equidistant frequencies w,; , then this is 
known as a harmonic analysis of the data. This is the technique that one would 
normally use for investigating the harmonics of a fixed identifiable frequency 
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2x/n under the assumption that X, is a genuinely periodic function (i.e., X, 
repeats itself exactly every n observations) apart from an additive error term. 
It should be mentioned that its misuse when these assumptions do not hold 
has been responsible for the acceptance of probably more spurious hypotheses 
than any other statistical or applied mathematical tool. 

The following consequences of (9.1) are algebraic identities which are true 
independently of any assumptions about X, . Defining the sample autoco- 
variance by 














(9.2) 
















it follows that 
n-1 
I,(w;) = x le +2 >> ce cos wk | (9.3) 
k=1 


and also the inverse relation holds: 


n n-1 
My = z 3 = fF 2 T,,(@;). (9.4) 
t=1 i=0 
These fundamental identities express the fact that for any X, (¢ = 1, --- , n), 


the periodogram is equal to the Fourier transform of the sample covariance 
function and that the sample variance may be decomposed into contributions 
from each periodogram ordinate. 

The difference between the statistical approach to time-series and harmonic 
analysis is the following: in the statistical approach it is assumed that the 
sample (realisation) has been drawn from a population (process) which we now 
assume to be stationary with the following properties: 


E(X,) = 0 
E(X.Xisn) = Ve 
If we now take averages on both sides of (9.3), then 
E(w) = + ln +2 . (1 “ HN cos wk| (9.5) 
and the spectral density f(w) as defined by (8.1) is obtained from 
Lt E(I,(@;)) = o°f@,). (9.6) 


no 












It is to be noted that the definition given by Blackman and Tukey ((6] page 7) 
as Lt,.. (I,(w;)) is misleading since it makes no reference to the fact that we 
are assuming that an ensemble (or population) exists with given statistical 
properties. 

Two observations are worth while making at this stage. 

(1) The definition (9.2) for the sample autocovariance has been chosen so 
that (9.3) is of the simple form shown; it differs slightly from the definition 
normally given, in which the divisor is (n — k). Hgwever, we draw attention 
to the important observation made by Parzen [9] that (9.2) has a smaller mean 
square error than the unbiased estimate which is the one normally used. 
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(2) It will pay in practice to define an autocovariance in which the sample 
mean is subtracted from each observation and sometimes a further factor due 
to a linear trend. This means that the autocovariances used for analysis would 
be of the form 







é& = {X, — X — 0(X, — XY) } {Xi — X -— WK, — X)} (9.7) 






-~ 


= \— 
iMi 





with perhaps b = 0 if it is considered unnecessary to remove a linear trend. 

The reason for this will become apparent later when it will be shown that if 
these quantities are not subtracted away before an analysis is made, their 
presence will affect the estimation of the spectral density at low frequencies. 
For the case of the non-zero mean, the definition of the autocovariances may 
be modified by subtracting E(X,) and then the sample autocovariances defined 
so that (9.3) still holds. 

In view of (9.6), it would seem that the natural estimate to take for o°f(w) 
is I,,(w). It is one of the surprising features of the theory that this is not the 
case. As an empirical fact, this has been known for some time. Engineers have 
observed, for example, that harmonic analysis of noise which was known to 
be white produced a highly spiked spectrum. Statisticians and others have 
observed that harmonic analysis of random numbers (an artificial source of 
white noise) produces the same effect. 

Bartlett [10] pointed out that this behavior could be explained by the fact 
that 



















Lt Var (/,(w;)) = of’). (9.8) 
Hence (9.6) and (9.8) indicate that whereas for large sample sizes, the mean 
periodogram does tend to the spectral density, the variance of the fluctuations 
of I,,(w;) about f(w;) does not decrease to zero as n — © as they do for all well 
behaved estimators. In fact, for large n, the distribution of J,(w) is a multiple 
of a x’ distribution with 2 degrees of freedom, independently of n. This means 
that in no statistical sense does J,(w) converge to f(w) as n becomes large. 

Thus, harmonic analysis which is the technique appropriate for the analysis 
of periodic function breaks down completely when applied to a statistical 
fluctuation. Historically, this explains the existence of a theory of spectral estima- 
tion and also provides the starting point in the development of the theory. 

An analogous state of affairs exists, as is well known, in the estimation of 
a probability density function; if the group interval for the histogram is too 
small, the estimated density function becomes very erratic. In this case it is 
necessary to average over fairly broad interval widths in order to obtain reliable 
results. Since from (9.5), E(I,(w)) ~ f,(w), as defined in (8.4), the periodo- 
gram corresponds to a choice of group interval on the frequency scale which 
is too narrow; in the language of section 8, the bandwidth of the periodogram 
is far too small. 
































10. GENERAL SPECTRAL ESTIMATES 


Bartlett [10] suggested in 1948 how harmonic analysis could be modified 
so as to lead to improved spectral estimates. This was the simple device of 
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splitting the series into p sets of m terms so that n = pm. For each section of 
the series, conduct a harmonic analysis; call the periodogram ordinates for 
the r’th sub-series (r = 1, 2, --- , p) Im,-(w;). If for each frequency, the average 
of these periodograms is taken over the p sub-series, i.e., 

Tales) = 5 2 Tol), (10.1) 
then it is clear that the variance of /,,~;) will be approximately given by 
o*f?(w)/p or o*f’(w)m/n. It follows that for fixed n, we may make the variance 
as small as we please by taking m small. 

It is very easily shown that apart from a few end corrections in going from 
one sub-series to the next, this is equivalent to calculating 


m 1 n—1 
f n() = 7 |« + > AnCk cos ok | (10.2) 

k=1 
with \, = 1 — k/m (k < m) and d, = 0 (k > m). This leads us to the same 
type of considerations as were introduced in section 8 where it was found de- 
sirable to weight the c, even when these were known exactly (i.e., the y,) in 


order to improve the focussing power of the window. In general, (10.2) leads 
to an analogous formula to (8.4) viz., 


fe) = [ LOK, ») dy (10.3) 


K, y) = 3{u@ — y) + u@ + y)}. (10.4) 


1 +n ai 
u(y) = 5 ee (10.5) 


from which it follows that [5 K(w, y) dy = 1. 

This means that weighting the sample autocovariances is equivalent to averaging 
or filtering of the sample periodogram. (The transition from (10.2) to (10.3) 
is a mathematical step which depends on the fact that if two functions \, and 
¢, are multiplied, their Fourier transforms are convolved). 

We have thus reached the following conclusions in section 8 and the present 
section: that both the smudging of the spectrum and its variance may be altered 
by suitable weighting of the autocovariances (or equivalently by altering the 
kernel or window K(w, y).) It is an unfortunate fact that if we try to reduce 
the smudging, we increase the variance and vice versa. We proceed to show 
by reference to the weights given by Bartlett why this is the case. 

It is useful to bear in mind that whereas the weights \, in (10.2) are defined 
for k = 1, 2, --+ , n, the weights in Bartlett’s original estimate, vanish for 
k > m. It is clear that the weights beyond a certain point make very little 
difference so that they may be ignored—if necessary, the ignoring of these 
weights could be compensated for by adjusting’some of the earlier weights. 
However, there are distinct advantages to making \, = 0 (k > m) in that in 
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this way, only m sample autocovariances need to be calculated, with a saving 
in computation which may be quite considerable. Estimates for which \, = 
0 (k > m) will be termed truncated estimates. 

The K(w, y) corresponding to Bartlett’s estimate is easily shown to be 


_ 1 fein? (m/2)@ + y) , sin’ (m/2) — wv 
ae = sin’ w+) + sin’ Hoy) sta 
This window is plotted with other windows in figure 2 where it may be seen 
that it falls off rapidly from a maximum at the peak frequency and reaches 
zero at the points y = -+-2/m. At points beyond this it will oscillate but these 
oscillations are of much smaller amplitude than the height of the main peak 
at y = w. It is apparent that any sensible definition of the bandwidth of the 
window must be of the same order of magnitude, but somewhat smaller, than 
the base width 27/m so that it will be of the form \/m where d will depend 
on the particular definition of bandwidth used. 

It is clear also that increasing m has the effect of reducing the bandwidth 
and hence increasing the focusing power of the kernel with a corresponding 
decrease in the smudging or distortion of the spectrum. However, from the 
expression for the variance, it may be seen that increasing m has the effect of 
increasing the variance. These considerations hold true in general: distortion 
(smudging) and variance tend to oppose one another. 

In table 1 are listed most of the weights \, and corresponding kernels K(w, x) 
(or equivalently u(w)) which have been suggested up to the present. From a 
general point of view therefore, there are two main points which require dis- 
cussion: F 

(1) The shape of the window i.e., which one of these weights or windows is 
best in any sense and what are the desirable features of a kernel or window? 

(2) What should be the bandwidth of the window in any particular situation 
since by adjusting this, we should be able to set up a balance between dis- 
tortion and variance? 

There is a third issue which is linked with (2), namely, how should one define 
bandwidth in the first place? These considerations will be discussed in greater 
detail in sections 12 and 13. 


11. Heuristic TREATMENT OF THE SAMPLING PROPERTIES 
OF SPECTRAL ESTIMATES 


In this section, we give a rough method of deriving very simply the mean 
and variance of the general spectral estimates f,(w) given by (10.3). These 
expressions will then be used to derive an approximation to the sampling dis- 
tribution of f, (w) based on the x’ distribution. 

It is a basic result in harmonic analysis (see e.g., H. O. Hartley [11]) that 
when the time-series is normal and the observations are independent, that the 
periodogram ordinates I,,(w;), (w; = 2xj/n and j = 1, 2, --- , [n/2]) are in- 
dependently distributed as x’ variables with 2 degrees of freedom. In the case 
where j = 0 and j = n/2 when n is even, I,,(w;) is distributed as a x” with one 
degree of freedom. From the point of view of harmonic analysis J,,(w;) may 
be regarded as the regression sum of squares due to the j’th harmonic which, 
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since it involves the fitting of 2 constants corresponding to the amplitude and 
phase, will be distributed as a x; under the null hypothesis of no harmonic 
regression. 

In the case where the time-series is autocorrelated, these results are true 
for I,(w;)/f(w;) asymptotically as n becomes large and hence are reasonable 
approximations for finite n. In fact the moments given by (9.6) and (9.8) are 
the moments of this limiting x? distribution. 

Suppose n is even (= 2p) and that we replace the integral in (10.3) approxi- 
mately by a sum over the periodogram ordinates; then 


fie) = =O WI). (11.1) 


where k;(w) = K(w = 2nj/n, y). 

It now becomes clear why it is essential to remove a constant mean and linear 
trend before calculating the autocovariances. The sample mean [,(0), for 
example, in addition to dominating the contribution to f, (w) at zero frequency 
will also contribute in“view of (11.1) to frequencies close to zero and hence 
distort the picture given by f, (w) of f(w). If there is linear trend in addition, 
then this will contribute to J,(w;) at all frequencies, but predominantly at low 
frequencies, and hence distort the spectrum in this region. The mean of f, (w) 
may now be calculated using (9.6) giving 


“a QS tw 
Efe) ~ 2 kf.) 
where we now take o” = 1 without loss of generality. 
As n — © this is given approximately by 


BGs) ~ [ K(o, vif) dy. (11.2) 


If we now assume that f(w) does not vary too much over the bandwidth of 
the window K(w, y) and observe that K(w, y) has been chosen so that 
Ji K(w, y) dy = 1, then E(f, (w)) ~ f(@). 

A similar argument for the variance using the variance of J,(w) as given 
by (9.8) gives 


Var (f,(w)) ~ > ki‘ f'(;) 


~ 2 [ K%e, WW ay. (11.4) 


With the same assumptions as above, this becomes 
m 2 . 
Var (fxs) ~ fw) [ K*w, ») dy. (11.5) 


A similar argument for n odd leads to the same formula as (11.5). It may be 
seen from (11.1) that f, (w) is a weighted sum of x’ distributions and hence it 
is reasonable to approximate its distribution by a x’ distribution as was done 
in general for weighted sums of variances by Welch [13]. 

In this case the equivalent degrees of freedom f (denoted by E.D.F.) of 
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TABLE 2 
Properties of Spectral Windows 


E.E.V.B. 
Variance /f?(w) 


2 
16m/15n ~ m/n 15n/8m ~ = 


0.542m/n ~ m/2n}| 3.7n/m ~ 4n/m m/3.7 ~ m/4 


a/nh 2nh/x a/2h 


E.D.F.: Equivalent degrees of freedom 
E.E.V.B.: Equivalent equi-variability bandwidth 
E.1.E.: Equivalent number of independent estimates 


the approximating x’ distribution will be given by 
f= 2E*(f,(@)) es saline anaiaianii 4 
Var Gs) yf” Ke, 9) dy 
0 


The variance and E.D.F. values corresponding to various choices of the kernel 
K(w, y) are given in columns 2 and 3 of table 2. 

It is interesting to apply this technique to find the E.D.F. of the sample 
variance using the decomposition (9.4). Using the expressions for the moments 
of I,,(w) given by (9.8), it is possible to show that 


(11.6) 


E.D.F. (variance) = ———~—— (11.7) 
r | P(e) dw 


and it is easily verified that this reduces to n when f(w) = 1/7 i.e., the ob- 
servations are independent. 

A special case of (11.2) and (11.5) is important in what follows. This is the 
case when K(w, y) corresponds to the kernel of a perfect band pass filter so that 


Kw, ) =a @—h<y sath) (11.8) 


from which it follows that 


sin kh 
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(11.8) was probably the first kernel suggested for spectrum analysis and was 
put forward by P. J. Daniell [14]. Using (11.2) and (11.4), it is possible to write 
down simple expressions for its mean and variance expressed as integrals of 
f(w) and f?(w) over the width of the window. If the further assumption is made 
that f(w) does not vary too much over the width of the kernel, then the variance 
of the spectral estimate corresponding to a ‘“‘band pass’’ window may be written as 


Var (f2(0)) ~ = f). (11.10) 


It is necessary to emphasize that the sampling theory of this section leans very 
heavily on the assumption of normality of the original time-series. Non-normality 
will tend to inflate the sampling variance so that the above expressions should 
only be regarded as rough guides. It would seem that existing testing and 
confidence procedures for spectra will be non-robust in exactly the same way 
as was shown for variances by G. E. P. Box [15]. 

We now introduce two further concepts. 


(1) Equi-variability definition of bandwidth 


Reference has already been made in section 8 to the definition of bandwidth 
used by Parzen [9], namely that this is half the base width of the rectangle 
with the same area and same height as the given window. A modification of 
this which seems to tie in more naturally with the needs of spectral analysis is 
the following: bandwidth is defined as half the base width of the rectangular 
or band pass window which has the same area and which in addition, gives 
rise to the same variance for the spectral estimate as the given window. We 
call this the equivalent equi-variability bandwidth of the window which we denote 
by E.E.V.B. Hence, equating (11.5) and (11.8) we obtain, since the E.E.V.B. = h, 


1 


of [" xe,» dy} 


The units of E.E.V.B. in this expression are radians per second since we have 
agreed to call At one second and then rescale later. In cycles per second, the 
corresponding value is obtained by dividing by 2r. 


E.E.V.B. = (11.11) 


(2) Equivalent Number of Independent Estimates 


A very attractive property of the rectangular or band-pass windows is that 
since the windows themselves are non-over lapping and since they weight the 
periodogram ordinates which are almost independent, these estimates themselves 
will be almost independent. The number of independent rectangular estimates 
is clearly +/2h. To a fairly high degree of approximation, we can speak of the 
equivalent number of independent estimates for a window which is not rectangular 
by replacing h by its E.E.V.B.; denoting this by E.I.E., it follows that 


E.LE. = +/2EEVB. =r [ K*(w, y) dy. (11.12) 
0 


The values of E.E.V.B. (in cycles per second) and E.I.E. for the various kernels 
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are to be found in columns 4 and 5 of table 2. The notion of E.I.E. will be very 
useful in section 14 when we consider the problem of comparing two or more 
spectra. 


12. THe BLackMAN-TUKEY APPROACH 
A notable contribution to the theory of spectral analysis has been made by 
Blackman and Tukey [6]. In this section we summarise what we think are the 
main ideas in their approach and discuss critically certain aspects which we 
regard as being unsatisfactory. 


(1) Resolution 
The figure quoted for the “resolution” of the Blackman-Tukey window (2) is 
Resolution (in ¢.p.s.) = 1/m (12.1) 
where m is the number of lags of the autocovariance function used in calculating 
the spectral density. Reference to table 2 shows that the E.E.V.B. (equivalent 
bandwidth) of this window in c.p.s. is 1/2m. This means that “resolution” as 
defined by Blackman and Tukey is just the width of the rectangular window 
which gives the same variance for the spectral estimate as their window. In 


some vague sense, it will measure the degree of smudging to which the true 
spectrum is subjected. 


(2) Equivalent Degrees of Freedom 

The variance of the Blackman and Tukey window (2) given in table 2 is 
3m/4n which they round to m/n to account for non-normality. This leads 
to an E.D.F. of 8n/3m which in view of the rounding off in the variance becomes 


2n/m. Hence 
f = E.DF. ~ 2n/m. (12.2) 
(3) Design Considerations 


Suppose that it is required to design a spectral analysis in advance so that 
it has prescribed properties. First, we specify the number of degrees of freedom 
per estimate, say 8 or more. Then we specify the amount of distortion or 
smudging we are prepared to tolerate as measured by (12.1). This will deter- 
mine m, the number of lags, which may then be used to calculate n for a specified 
f from (12.2). 


(4) Analysis Considerations 

In this case, n is fixed so that all that can be done is to set up a “judicious 
compromise’’ between f and m using formula (12.2). It is to be noted that the 
actual formulae which are used for design and analysis criteria by Blackman 
and Tukey are different from those given above. However, the differences are 
only slight since the Blackman-Tukey formulae are expressed in a language 
more readily understood by engineers (but unfortunately not by statisticians). 
Instead of specifying f, they specify what is clearly equivalent (since it can only 
depend on f), namely the width of the a% confidence interval for what is ef- 
fectively the logarithm of the spectral density; in their language, this is the spread 
of the a% confidence interval expressed in decibels. There are also minor modifi- 
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cations to take account of the fact that the data may be made up of several 
pieces. Apart from this, the Blackman-Tukey design relations are essentially 
contained in (12.1) and (12.2). 


(5) Prewhitening 


By prewhitening is meant the performing of a filtering operation on the data 
X, using a filter with carefully chosen frequency response function, the actual 
spectrum analysis being conducted on the output Y, from this filter. This 
filtering operation may be performed before or after the data is recorded and 
read at equidistant intervals. Thus the spectrum which is actually measured 
is fy(w) where fy(w) is related to fx(w) by 


fro) = | T® P fx@® (12.3) 


and where T(w) is the frequency response function of the filter. Once fy() 
has been determined, then fx(w) may be obtained by transforming back using 
(12.3). The frequency response function Tw) is specially chosen so that the 
spectrum of fy(w) is as nearly white or uniform as possible. It is clear that this 
will only be possible if fx(w) is known to some degree of accuracy and Blackman 
and Tukey suggest that a pilot spectrum analysis should be made to obtain this 
information. Two broad reasons have been given for prewhitening. 

(a) Errors will be incurred during transmission in the electrical equipment, 
during recording on tape or photographic paper and during the reading of the 
ordinates. There is every reason to believe (and this is supported by a certain 
amount of experimental evidence) that the ‘error spectrum” will be white 
and hence will appear in the form of a strip of finite height on top of the spectrum 
which is to be estimated. 

One reason which is given for prewhitening is that if the spectrum fy(w) 
is nearly white, the effect of the error spectrum will be distributed evenly for 
all frequencies. If fx(w) itself were estimated, then small spectral ordinates 
would become submerged in the error spectrum. 

(b) If the spectrum being estimated is not white, then there will be con- 
siderable distortion in a region where f(w) changes rapidly over the width of 
the spectral window. In particular, a considerable contribution to the estimate 
corresponding to a given peak frequency might arise from “‘leakage’’ from a 
frequency at some distance away from this, if f(w) had a peak at this point and 
simultaneously the spectral window had a ripple or side lobe. In other words, 
by standardising most spectra so that they are as smooth and uniform as possible, 
standard spectral windows may be used. We now proceed to make some detailed 
criticisms of this approach to spectral estimation. 


(1) Resolution 


We do not agree with the use of the word “resolution” by Blackman and 
Tukey on the grounds that this is the equivalent bandwidth of the window as 
given in table 2. By resolution we mean the amount of distortion or smudging 
of the spectrum and this is clearly not bandwidth although it is related to 
bandwidth in some manner. Hence resolution is just as much a property of 
the spectrum being estimated as of the width of the window being used. Even 
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if one were prepared to equate resolution to bandwidth, windows with the 
same equivalent bandwidth would give spectral estimates with different prop- 
erties. Thus windows 3 and 5 of table 2 based on m autocovariances and window 
6 based on 2m autocovariances have the same equivalent bandwidth but dif- 
ferent shapes and hence different resolving properties. 


(2) Degrees of Freedom 


The variance and degrees of freedom will only be meaningful descriptions of 
the statistical reliability of the estimate provided we have been fortunate 
enough to avoid a great deal of distortion or smudging. In some situations, 
the error could be dominated completely by the distortion in which case the 
variance is almost meaningless. It is to be observed also that the variance 
is only a property of a single spectral estimate; in order to say something about 
all the spectral estimates, it will be necessary to know the covariances of ad- 
jacent estimates. Reference to table 2 shows that alternate spectral estimates 
are effectively independent for the Blackman-Tukey window (2). They sug- 
gest therefore ((6], p. 147) that it is only alternate estimates which are com- 
pletely “resolved”. In our opinion, resolution has nothing whatsoever to do 
with the fact that alternate estimates are statistically independent. It is implicit 
also in the Blackman-Tukey approach that the spectral density is only calculated 
at a fixed number of ordinates k/m(k = 0, 1, --- , m). This means that in a 
situation where the spectrum has a large peak at a point exactly half way be- 
tween these frequencies, the estimates at the fixed ordinates might be poor 
estimates of this peak. Hence we see no objection to sliding the spectral window 
along the frequency scale and calculating spectral ordinates at any frequency 
we choose. These ordinates may be statistically correlated but we regard this 
as being unimportant if we stand a better chance of picking out a peak. This 
is an important point because it brings out the difference between the band- 
width (of the window) and resolution (of the spectrum as conditioned by the 
bandwidth of the window). Blackman and Tukey have equated bandwidth to 
resolution. 


(3) Design Considerations 


The Blackman-Tukey design considerations seem to take no account of the 
fact that some spectra will have quite different shapes from others and hence 
will require much longer series to estimate them. This is a surprising result. 
If a spectrum analysis is designed using a preassigned bandwidth for the window 
then the extent to which this is going to be successful or not will depend on how 
successful the choice of bandwidth was in relation to the shape of the spectrum 
which we eventually measure. If the spectrum is fairly smooth, then this band- 
width will have been wasteful in the sense that a much wider bandwidth would 
have done just as well. If the spectrum is highly fluctuating, then the chosen 
bandwidth will do badly because it is too wide. If it is argued that in such an 
eventuality, we could change the bandwidth to meet the needs of the record 
being examined, then by doing so we may alter the degrees of freedom of each 
estimate drastically so that the properties of the estimates will differ con- 
siderably from what we had designed them to be. 
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(4) Analysis Considerations 


The previous criticisms become more acute in the case of the analysis of a 
record of fixed length which may or may not have been decided on the basis 
of some design considerations. In the case of analysis, it is necessary to balance 
bandwidth and degrees of freedom against one another. Blackman and Tukey 
state that the “resolution’’ that is required should be specified taking care 
that there are ‘‘enough’”’ degrees of freedom per estimate. They state ((6], 
page 11) that normally, the truncation point m would be chosen to be about 
5%-10% of the length of the record. 

We do not agree with this on the grounds that this could lead to a truncation 
point which is far too small i.e., a window whose bandwidth is much too large. 
It is our opinion that in many situations, it is possible to select a truncation 
point based on the observed autocovariance function. It is suggested that this 
should always be plotted for lags up to 25%-30% of the total number of ob- 
servations; in certain instances it will be possible to conclude that the .auto- 
covariances beyond a certain point are sufficiently small to be neglected. In 
other situations, one may have to look at more autocovariances and in other 
situations where the autocovariances fluctuate considerably, it may be very 
difficult to make any such decision. In this case, it would be wise to calculate 
spectra based on several choices of truncation point or bandwidth. In the 
Blackman-Tukey approach, it is implicit that the autocovariance function is 
not even plotted; the bandwidth is specified and the corresponding spectrum 
obtained. 

Another difficulty about specifying the bandwidth in advance is the following. 
Suppose that two records are available, the length of one being twice that of 
the other. How does one choose a bandwidth for the two series? It is clear that 
we should be able to use a narrower bandwidth for the longer series since we 
have a larger number of degrees of freedom to work with. The question then 
arises as to whether the bandwidth should be in the ratio 2 : 1 or 2:1 or 
C log, 2 : 1 or something else. The fixed bandwidth approach clearly can not 
give an answer to this question. 

On plotting the autocovariances, it might become clear that the shorter 
series gave rise to an autocovariance function which damped out much more 
slowly than that of the longer series. This might result in choosing a larger 
truncation point for the shorter series provided the number of degrees of freedom 
per estimate was not too small. 

It is to be concluded, therefure, that the choice of bandwidth should be 
governed by the length of the record and by some knowledge of the spectrum 
which is being estimated. 


(5) Prewhitening 
Reason (a) for prewhitening seems to have much to commend it provided 
the prewhitening is done before the continuous trace X(t) is recorded. If the 
prewhitening is done after the trace has been read, then all the errors are re- 
covered in transforming back from the whitened to the original spectrum. 
Reason (b) is motivated by the fact that if we have to estimate spectra which 
are nearly white always, then as far as analysis is concerned, the choice of 
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the truncation point can be taken to be some percentage of the total number 
of observations. Prewhitening, therefore, does remove some but not all of the 
objections made above about the dependence of bandwidth on spectral shape. 
It does depend, however, on knowledge of the spectrum being estimated and 
this information will have to be derived from a pilot spectrum analysis. This 
may be regarded as being an equivalent step to finding where to choose the 
truncation point from the autocovariance function. On the basis of where the 
autocovariance function damps out, we hope to make a sensible guess as to 
the maximum bandwidth such that it is reasona’»le to assume that the spectrum 
is effectively constant over the bandwidth of the window. 

The above criticisms of the design considerations remain unaltered by pre- 
whitening. 


13. DEsIGN AND ANALYSIS CRITERIA 


(1) Bandwidth Criteria 


It has been shown in sections 10 and 11 that a compromise must be set up 
between smudging or distortion on the one hand and reliability or variance 
on the other. This compromise may be summarized in the relationship 


Bandwidth X Variance = Constant (13.1) 


for a given length of record. This was called an uncertainty principle by 
Grenander and Rosenblatt ([{16],,[17]) by analogy with the famous Heisenberg 
uncertainty principle in statistical mechanics. The definition of bandwidth used 
by Grenander and Rosenblatt was the standard deviation of the spectral window 
regarded as a probability density function and hence is different from any 
of the definitions used in this paper. In [9] Parzen has based an uncertainty 
principle based on what may be called a geometrical definition of bandwidth. 
In section 11 of this paper we have based another uncertainty principle on the 
equi-variability definition of bandwidth. It is interesting to observe that since 
the variance is inversely proportional to the E.D.F. and the bandwidth is 
inversely proportional to the equivalent number of independent estimates 
(E.I.E.), it follows that for the definition of bandwidth given in section 11, 
(13.1) may be writen in the form 


Equivalent Degrees of Freedom Per Estimate X Equivalent Number 
of Independent Estimates = n (13.2) 


the constant in this case being equal to n, the sample size or total degrees of 
freedom. 
In order to design a spectral analysis based on the results of section 11, four 
stages are necessary. 
(i) Select a kernel or window. 
(ii) Select the degrees of freedom required per estimate. 
(iii) Select the equivalent bandwidth required; this then determines m from 
column 4 of table 2. 
(iv) If f and m have been determined, then the sample size may be obtained 
from column 3 of table 2. 


an 
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It is to be observed that these design relations include the Blackman-Tukey 
relations as a special case and hence the criticisms made in the last section 
apply equally well to these. How should the bandwidth be made do depend 
on n and f(w)? 


(2) Mean Square Error Criteria 


The bandwidth criteria mentioned above are sensible because they do make 
for a compromise between distortion and reliability. However, they do not 
help us in saying whether 80%, say, of the effort should be employed in over- 
coming distortion and 20% in dealing with variance. One way of dealing with 
this is to consider some figure of merit which it is desirable to achieve. Three 
are available in spectral analysis at the present. 


(i) Mean Square Error at each Frequency 


This was introduced by Grenander and Rosenblatt [18] and Parzen [24] 
and may be defined as follows. 


Mean square error = E(f,(w) — f(w))? = Variance + (Bias). (12.3) 


This criterion may be discounted in future work since most people are agreed 
that the properties of an estimate at a single frequency are not relevant or 
important. 


(ii) Integrated Mean Square Error 


This consists of integrating (12.3) over all frequencies and was introduced 
by Lomnicki and Zaremba [19]. 


(iii) Maximum Mean Square Error 


This has been introduced recently by Parzen [9] (and is to be discussed in 
a later paper [20]) and consists of evaluating 


E{max (f2(0) — f(@))}. (12.4) 


All these figures of merit, for a given spectral window, depend on three quantities 


(a) Sample size 

(b) Truncation point or bandwidth 

(c) Some property of the spectrum e.g., f?(w) for (i), f5 f’(w) dw for (ii) 
and a very interesting quantity for (iii) which will not be discussed here. 


It is clear that these quantities should appear in any sensible criterion and 
it is interesting to observe that {% f?(w) dw is the quantity which appears in 
the E.D.F. of the sample variance given in section 11. 

For a given figure of merit, an optimum value of m may be chosen, say mop , 
by minimising with resnect to m. It follows that m,,, will be a function of n 
and f(w). When m,,, is substituted for m, then the criterion will also become 


a function of n and f(w). Hence, for design considerations, we may proceed 
as follows 


(i) Specify f(w) approximately 
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(ii) Choose one of the 3 figures of merit quoted above and give it a numerical 
value. 
Hence (i) and (ii) will determine n from which m,,, may be calculated using 
J(). For analysis, the procedure is much easier 


(i) Given n and f(w), m.,, may be determined. 
(ii) Given m,,, and f(w), the numerical value of the figure of merit achieved 
may be calculated. 


It is clear that a pilot investigation is required in order to obtain some knowl- 
edge about f(w). The figure of merit may also be used to compare windows. 
However, it becomes clear from Professor Parzen’s work that the ranking of 
the windows according to their performance on a particular figure of merit 
depends on which figure of merit is used. It is possible to discount windows 1 
and 7, but all the others seem to qualify for some consideration. For a detailed 
discussion of mean square error criteria in spectral analysis, the reader is referred 
to Parzen [20]. 

Clearly, there can be no answers to these questions in general since all sta- 
tistical figures of merit are ad hoc in the sense that they reflect different degrees 
of importance which are attached to various properties of the spectrum, i.e., 
they involve a loss function which depends on frequency. Questions arise such 
as the following: should equal weights be attached to all frequencies? If certain 
frequency bands are more important than others, then clearly more effort 
should be concentrated on these. If, for example, one were interested in locating 
a peak in a particular region, then considerations of bandwidth are likely to 
be more useful than mean square error criteria. We may be able to detect the 
peak with a small bandwidth and then use a wider bandwidth for the remainder 
of the spectrum whereas a mean square error criterion will require us to choose 
a fixed bandwidth once and for all. 

Reference has been made in section 8 to the fact that differences of opinion 
exist as to exactly what should, or can, be estimated in spectrum analysis. 
We conclude this section by making some comments on certain criticisms of 
mean square error criteria which have been made recently by Tukey [21]. 
These amount to saying that it is unrealistic to take some average of the dif- 
ference between f, (w) and f(w) as a criterion of merit since we can never hope 
to measure f(w) at individual frequencies w. We have recognized this fact in 
section 8 by observing that the best we can do is to estimate f(w) over kernels 
with bandwidth of the order of 1/n. We would agree, therefore, with Parzen [9] 
that it is sensible to compare f, (w) with spectral averages over a bandwidth of 
this order. However, Tukey [21] implies that the only sensible thing which can 
be said is that over the bandwidth of the spectral window used (of order 1/m 
and in general much wider that 1/n), the estimated spectrum intersects the 
“true” spectrum. Taking this to its extreme, the estimate based on zero auto- 
covariances, i.e., the sample variance, intersects the “true” spectrum. This is 
an interesting but nevertheless not a very useful observation. In our opinion, 
any sensible criterion should be able to express the bandwidth of the window 
in terms of the sample size and the smoothness properties of the spectrum; 
the mean square error criteria give an answer to this question. 
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14. THe LOGARITHMIC TRANSFORMATION OF THE SPECTRAL DENSITY 


It has been shown in section 11 that the variance of a general spectral esti- 
mate may be written in the form 


Var (f,(w)) ~ v(m, n) fe) (14.1) 


where m is the number of autocorrelations, n the number of observations and 
v(m, n) = kym/n where k; depends on the window used for the spectral esti- 
mation. 


Jenkins and Priestley [22] have made the obvious observation that 
Var flog. f.(w)} ~ ¥.(m, n) (14.2) 


which means that to a reasonable degree of accuracy, the sampling properties 
of log, {f,, (w)} are independent of frequency; this has several useful consequences. 
It will be assumed: 
(i) That corresponding to any spectral window, only the effective number 
of independent estimates (E.I.E.) will be used for the following analysis. 
(ii) That the number of degrees of freedom 2n/k,m is sufficiently large for 
the distribution of f, (w) to be taken as normal. It is to be observed that 
in addition to stabilising the variance, the logarithmic transformation 
of f, (w) will produce a distribution which is much closer to normal that 
the original x; . 
(iii) That the errors in f,(w) are dominated by the variance and not by the 
smudging. This means that the assumption that f(w) does not vary 
much over the bandwidth of the window must be reasonable. 


It is easily seen then that approximate confidence intervals for f(w), may 
be obtained using the relation 


exp {log f,(w) + za V vi(m, n)} (14.3) 


where z, is the upper a/2% limit of the normal distribution. 

However, it is better, (as has been indicated by Blackman and Tukey also) 
to plot log, (f.(w)) in which case the confidence limits are just the exponent 
of the exponential in (14.3). We are of the opinion that confidence intervals 
for single spectra are not very important nor useful since in addition to the 
three assumptions listed above, they also depend heavily on stationarity and 
normality assumptions. It is usually far more important to see that when the 
experiment is repeated, a spectrum is obtained which bears a reasonable re- 
semblance to the first. 


Comparison of Spectra 


The logarithmic transformation may also be used to compare two or more 
spectra. Thus, a designed experiment may have been carried out in which 
several spectra have been obtained under different conditions and it is required 
to ask whether these are reasonably homogeneous. Alternatively, two spectra 
may have been derived from the first and second halves of a long series and 
it is required to ask the question as to whether they differ by more than is 
reasonable. This may be regarded as a crude test for stationarity over the 
record. 
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It is assumed that 1 effectively independent spectral estimates are available 
for each spectrum so that it will be convenient for all spectral analyses to be 
based on the same number of lags, although the number of terms in each series 
may differ slightly. In the case of two spectra, it is easily seen that a test of 
homogeneity amounts to comparing 


: “ 

D= ; ¥ log, 2) / VS ekm, nm) + vm, ny , (14.4) 
i=l fna(@;) 

where w; = 2;/l, with the unit normal distribution. In general, a weighted 

analysis of variance could be conducted on the logarithm of the spectral density. 

Apart from simplifying the distribution theory, there is a further advantage 
to working with the logarithm of f, (w). By analogy with the properties of the 
sample variance in the case of independent observations, the distribution of 
f.(w) will be heavily dependent on the assumption of normality whereas the 
distribution of log, f, (w) will be less sensitive to such an assumption. 

It is gratifying that a number of statistical arguments lead us back to consider 
the natural way of measuring power, which is the one used extensively by 
electrical engineers. This is to work with the logarithm of power or with the 
decibel scale. It follows that on this scale, it is ratios of power which are im- 
portant, the relation between decibels and power being given by 


Decibels = 10 logy, (ratio of power). 


Hence an increase of power by a factor of 2 is equivalent to a change of 3 decibels. 


15. SUMMARY 


Possible users of spectral techniques may be a little bewildered after reading 
section 13 at the variety of methods open to him. However, in this sense, spec- 
tral analysis is no different from any other statistical technique. We now proceed 
to summarise the various stages in the conduct of a design or analysis and 
we also emphasize which aspects we regard to be important. 

(1) Preliminary considerations: It will be very important to know what 
relation the estimated spectrum bears to that of the spectrum of interest. 
Most of the important errors may have been introduced before the statistical 
considerations have begun. The interpretation of the spectrum is then dictated 
almost entirely by non-statistical considerations. 

(2) Choice of Spectral Window: The most important feature of spectral esti- 
mation is that some sort of window should be used with a bandwidth considerably 
greater than 1/n where n is the total number of observations. 

There is still some doubt as to what constitutes a good spectral shape but 
windows 3 and 6 seem to have attractive properties. 


(3) Analysis Considerations 


(a) Calculate the autocovariances having removed a mean and possibly a 
linear trend. For fixed n, all that is needed is to select the number of lags. 
(b) There seem to be three ways of doing this: 
(i) Plot the autocovariance function up to 25%-30% of the total sample 
size and determine a reasonable truncation point empirically. 
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(ii) Specify a bandwidth for the spectral window chosen taking care that 
there are enough degrees of freedom per estimate. 

(iii) Choose m from a mean square error criterion using some knowledge of 
f(w) obtained from (i) or from a pilot spectral analysis. 

(c) For design considerations, n may be determined as described in section 13. 


Ultimately there is not going to be a great deal to choose between the above 
three methods of choosing m from a practical point of view since one will be 
wise to base spectra on a few selections of m anyway. The author would prefer 
to work with (i) backed up by (iii) on the grounds that he is unable to specify 
a bandwidth in a vacuum unless possibly there are special objectives which 
restrict the choice of bandwidth. In addition to basing spectra on a few choices 
of m, it is suggested that one should feel free to calculate spectral ordinates at 
any frequencies. 

In the last resort, if it is difficult to make sense of the spectrum from a physical 
point of view, then the more refined statistical considerations are irrelevant. In 
particular, if taking the two halves of the same series gives widely differing 
answers or if the next experiment produces a different spectral shape, then one 
has far greater problems than statistical ones. 
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1. INTRODUCTION 


The problem of estimating, from a finite length of record, the spectrum of 
a stationary time series is important for many reasons. 

The theory of the transmission or detection of signals in the presence of 
noise attempts to provide a basis for the design of systems which are optimum 
(according to some criterion involving average performance of the system). 
All such criteria require for their application a certain amount of knowledge 
about the noise background. In certain cases, the character of the noise back- 
ground can be calculated from theoretical probability considerations. In general, 
however, the character of the noise has to be determined empirically from 
samples of the noise. One fundamental characteristic of the noise is its spectrum. 
Thus one is led to the problem of empirical spectral analysis of noise. 

Many aspects of a stationary stochastic process can be understood in terms 
of its spectrum. The spectrum enables one to (i) investigate the physical mecha- 
nism generating a time series, (ii) to determine the behavior of a dynamic 
linear system in response to random excitations, and (iii) to possibly simulate 
a time series. 

Other uses of empirical spectral analysis are as operational means (i) of 


1 Reproduction in Whole or in Part is Permitted for any Purpose of the United States 
Government. 
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transmitting or detecting signals, (ii) of classifying records of phenomena such 
as brain waves, (iii) of studying radio propagation phenomena, and (iv) de- 
termining characteristics of control systems. 

The ergodic theorem indicates that in certain circumstances covariances 
and spectra may be computed exactly from a single infinite record. For some 
stochastic processes arising in the physical sciences (for example, turbulence) 
the extent of individual records available for study may be as long as desired. 
In most cases (especially in economics), the length of series available is very 
limited. In any event, if one is going to estimate covariances and spectra from 
samples, one should attempt to estimate the magnitude of sampling errors 
arising from the finite time of observation of the stochastic process and from 
the estimating procedure employed. 

In this paper I consider the problem of estimating the spectrum of a trend 
free stationary time series which is assumed to possess a spectral density function. 
Many of the practical problems one encounters in connection with this problem 
are well treated by Blackman and Tukey [B7]. Here, it is my aim to discuss 
the mathematical considerations of which account must be taken in the theory 
of empirical spectral analysis, and to present possible procedures to be used 
to design an empirical spectral analysis. 

There is an extensive literature concerning the mathematical theory of 
spectral analysis; the list of references at the end of the paper cites the main 
published work that has appeared in the statistical literature. The mathe- 
matical considerations discussed in this paper are based mainly on previously 


published work (particularly my own). However, some of the details presented 
here are new. 


2. ASSUMPTIONS 


Suppose that one is observing a time series X(t), with finite second moments, 
which may be either a continuous parameter time series or a discrete parameter 
time series. A sample of size T in the continuous parameter case is written X(t), 
0 < ¢ < T, and in the discrete parameter case is written X(t), ¢ = 1,2, --- , T. 
We seek to treat simultaneously both discrete and continuous parameter time 
series. We will often write equations in two forms, with a suffix c for the con- 
tinuous parameter case, and a suffix d for the discrete parameter case. Most 
equations will be written explicitly only for the continuous parameter case; 
it will usually be obvious how to write the corresponding equation in the dis- 
crete parameter case. 

It is assumed that the mean value function 


m(t) = E[X()] = 0 forall ¢. (2.1) 


It should be noted however that it is possible to extend the results of this paper 
to the case in which (2.1) does not hold. 


For ease of exposition we assume that X(¢) is a normal stationary time series 
whose covariance function 


RQ) = E[XX(t + »)] (2.2) 
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satisfies 
/ v? | R@) | dv < @ (2.30) 


= v | R)|<o. (2.3d) 


It then follows that the time series X(¢) possesses a spectral density function 
f(w) satisfying 


ih oe [ c e-"*R(@) dv, (2.40) 


yj a As > RG). (2.44) 


Further f(w) has a continuous second derivative. 

It should be pointed out at once that in assuming the validity of (2.3) we 
are postulating that the time series being studied has a continuous and smooth 
spectrum and possesses no periods corresponding to spectral lines. This is an 
assumption which must be justified in each application. 

Given a sample of size 7’, the sample covariance function R,(v) is defined by 


T-|>| 
Rr() =F [ XOXE+ lo), lol <7? 


= 0, lv|>T; 
T-\r] 


Ro) =F DL XOXE+|0), 0 =0,41,---, £7 — 1) 


t=1 
= 0, v= 4T,+(T + 1),:::-. 
The sample spectral density function (or periodogram) is defined by 


fos) = sn | [ox atl, 


—-2o <ao< @ 


Irls) = 5 | Dex) | ,  -eSwdSr. 


It may be shown that f;(w) and R,(v) are Fourier transforms: 


teked = x [ . 2 aa [ * glial 


Stok = De" Rr() = = R,0) + © cos waktr() 
Rr) = [efrw), 2 <v<o 


Rv) = [ e'’°fr(w) dw, v= 0, +1,--- 
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3. FORMULATION OF THE PROBLEM OF EMPIRICAL SPECTRAL ANALYSIS 


It might be thought that the problem of empirical spectral analysis is to 
form an estimate of the value f(w.) of the spectral density function at a given 
frequency w,). However, it is not possible to consider such an estimation problem 
(and to develop criteria for good estimates) without some idea of what purposes 
the estimate will be used for. 

In order to understand the meaning of the spectral density function f(w) 
we must introduce the notion of a spectral average. For every bounded con- 
tinuous function A(w), we define 


A) = [ : A(w)f(cs) de (3.16) 


- [ ’ A(u)f(w) deo (3.14) 


to be a spectral average, corresponding to the spectral window A (w). For future 
reference we note that 


Ji) [ : A(w)f r(cs) deo (3.2c) 


“ [ : A(w)fr(oa) deo (3.24) 


is called the sample spectral average corresponding to the spectral window A(w). 
As one use of the spectral averages, we now show that the variance of an im- 
portant class of statistics can be represented as spectral averages. 

Suppose one is observing a continuous parameter time series Y(t),0 < ¢ < T, 
which is assumed to be of the form 


Y(t) = Bw(t) + X(d) (3.3) 


where w(t) is a known function (for example, w(t) = cos wot), X(é) is a stationary 
time series with mean 0, and 8 is a constant to be estimated. The least squares 
estimate of 8 is 


st = [ we) Yat, — we() = MO (3.4) 
: [ w°(t) dt 


0 
which has mean 8 and variance 


T 2 
Var [8*] = E [ wr(t)X(f) dt | . 


¥ 


E [ " we(QX() dt) = [ r [ " we(R( — Dwe(t) ds dt 


2 in diof(w) j. wr(te'*' dt i 
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Therefore 


Var [8*] =f dof(s) Aw) (3.7) 


AG) = | [ " we(e*! dl f (3.8) 


From (3.7) one sees that the variance of a statistic (such as 8*) which is a linear 
function of the observations is a spectral average. 

The notion of the bandwidth of a spectral window plays an important role 
in spectral analysis. The bandwidth 6(A) of a spectral window is defined as 
the length of the base of a rectangle which has the same area and same maxi- 
mum height as the graph of A(w); in symbols 


f Aw) dw 


B(A) = | deol (3.9) 


The frequency w. such that A(w)) = max, | A(w) | is called the peak frequency 
of the spectral window A (w). 
For A(w) given by (3.8), 


[ 3 Aw) do = 2x / [ " w(t) db. (3.10) 


To determine the behavior of max | A(w) |, let us consider the special case 
of w(t) = cos wot. Then 


i eae” i 3 i{(% +o) — 1, cos — wT — 1 


w + wo W — Wo 


+ (Set oot 4 sate = wl) . en 


w + wo W — Wo 


[ " w(t) dt = Ar + sin a (3.12) 


Consequently, max, | A(w) | = 1 for all 7, and 


g(A) = 2n / [ " w(t) dt = 4x/T. (3.13) 


In words, (3.13) says that the spectral average J(A) [which represents the 
variance of the least squares estimate 6* of the amplitude 6 of a sine wave cos wot 
of known frequency being estimated in the presence of a noise X(t) with spectral 
density function f(w)] corresponds to a spectral window A(w) which has peak 
frequency w) and bandwidth of the order of the reciprocal 1/7’ of the sample 
size T. One thus sees that spectral averages corresponding to spectral windows 
with bandwidth 1/T are quantities which it is natural to seek to estimate. 
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The problem of empirical spectral analysis may now be formulated as follows: 
given a sample of length T of a time series X(t), to form estimates either of the 
value f(wo) of the spectral density function at a given frequency wo , or of spectral 
averages J(A) corresponding to spectral windows A(w) which have peak frequency 
w, and whose bandwidth is of the order of 1/T. 

It has been shown by Grenander and Rosenblatt [G4] that one need only 
consider estimates of the form 


fF@o) = x - e **’*krv)Rr(v) dv (3.14c) 


it > eh a(e)Ro(0) (3.144) 


where the constants k,(v) are to be chosen as an even function of v. Estimates 
of the form of (3.14) may also be written as sample spectral averages 


Ho) = [ Kelw — wfrla) das (3.150) 


= [ Kr(w — c)fr(0s) deo (3.15d) 
where the spectral window K 7(w) is defined by 


ro 
Kyle) = 5- [ _e*kes(o) do (3.16) 


ii : D ekr(). (3.16d) 


We note that f#(w) may also be written as a discrete average over the values 
of the periodogram at the points ; 


w,(T) = — (3.17¢) 


2rm 


oT +1 (3.17d) 


form = 0, +1, +2, --- ; in virtue of the expressions 


fre) = EY feloa(T) Kalo — o4()), (3.18¢) 


m=—o 


2x Z 
= oT +1 2 f r(@m(T))K rw — w»(T7')). 


-T 


We assume that the spectral window K,(w) achieves its maximum at w = 0. 
Its bandwidth may then be written 


/ 5 Kes) de 


B(Kr) = ~~ (3.19) 





MATHEMATICAL CONSIDERATIONS IN THE ESTIMATION OF SPECTRA 


In view of (3.16) we may also write 


B(Kr) = ark (0) 


/ _ kel) do 


(3.20) 


In many cases, k7(0) = 1. In these circumstances the class of estimates of the 
form of (3.14) may be characterized as follows. 

As an estimate f#(wo) of a spectral average J(A) whose spectral window A(w) 
has peak frequency w) and bandwidth of the order of 1/T, we consider sample 
spectral averages J 7(K7(wo — w)) whose spectral window K7(wo — w) has peak 
frequency w, and bandwidth of the order { {77 ky(v) dv}~* where 


k7(v) = E e’*Kr(w) dw. (3.21) 


We seek to determine that estimate of the form of (3.14) which is best according 
to some criterion. 


4. EXISTENCE OF CONSISTENT ESTIMATES OF THE 
SpecTRAL Density FuNcTION 


The fact that empirical spectral analysis of a stationary time series possessing 
a spectral density function is not an easy problem may be said to be due to 
the fact that the obvious estimate of the spectral density function, namely 
the sample spectral density function or periodogram f,(w) [defined by (2.7)], 
is not a consistent estimate of the true spectral density function f(w). 

Let us show why this holds in the case of a normal continuous parameter 
time series X(#) possessing a spectral density function. It may be shown that 
the sample covariance function R,(v) is at each v a consistent estimate of the 
true covariance function R(v); that is, 


lim E[| Rr) — R@) "] =0 (4.1) 
Pilim Rr) = R@)] = 1. (4.2) 


On the other hand it may be shown that 
lim Efe’7“”’] = (1 — iuf@))”’ (4.3) 
T+2 


for every real number u and frequency w. Consequently, for every real number x 


lim P{fr@) > 2] =e". (4.4) 


In words, fr(w) is asymptotically exponentially distributed with mean f(w). 
Unless f(w) = 0, there is no mode of probabilistic covergence in which f7(w) 
tends to f(w) as T tends to ~. 

Nevertheless it is easy to construct sequences of estimates of f(w) which are 
consistent. From the fact that R,(v) is a consistent estimate of R(v), which 
may be written 


[ : ef (w) deo > [ : e* f(a) deo (4.5) 
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it follows (by a continuity theorem for random monotone functions; see Parzen 
[P5]) that for every bounded continuous function A (w) 


[ 3 Matteo [ 2 A(a)f() deo, (4.6) 


the convergence in (4.6) being in the same mode of convergence as prevails 
in (4.5). We assume here that (4.5) and (4.6) hold in the sense of convergence 
in quadratic mean. 

It is now clear why the periodogram f7(w) is not a consistent estimate of 
the spectral density function f(w), while at the same time the sample covariance 
function is consistent. By thinking in terms of the continuity theorem for 
characteristic functions, one sees that it is generally the case that convergence 
of characteristic functions does not imply convergence of the corresponding 
probability density functions. The reasons usually given for the inconsistency 
of the periodogram (see, for example, Grenander and Rosenblatt [G4], p. 151) 
that “for large | v |, the statistics R(v) are relatively unstable estimates of R(v)” 
is not at all relevant. Indeed, the quoted assertion, which is often made, does 
not seem to be true (Parzen [P8]). 

The quoted assertion is true of the estimate defined by (for v > 0) 


0) = 75 [ "" XWX(E + 2) dt, 


which is an unbiased estimate of R(v), unlike R;(v) which is a biased estimate 
of R(v). Many authors have advocated the use of the unbiased estimate R7(v) 
in preference to the biased estimate R,(v). However, it appears to me that 
R,(v) is preferable to R7(v) for two reasons: (i) R,(v) is a positive definite 
function of v, which is not the case of R;(v); (ii) the mean square error of R,(u) 
as an estimate of R(u) is in general less than that of R7(v). That (i) holds is 
immediate. That (ii) holds is shown in Parzen [P8]. 

There is a multitude of ways in which one can form consistent estimates 
of the value f(wo) of the spectral density function at a given frequency w» . 


Let A,(w) be a sequence of spectral windows such that at the frequency w , 
asn— o, 


JA.) = [ As@f() do > fla). (4.7) 


To each spectral average J(A,) we can find a sample size T,, such that 
E | Jr(A,) — J(A) P <2 (4.8) 


Therefore, as n — ©, J7,(A,) — f(wo) as a limit in quadratic mean, and we 
have shown that consistent estimates of f(wo) exist, if (4.7) holds. 

There are many ways in which one can find spectral windows satisfying 
(4.7). For example, the sequence of spectral windows 


sin (2 — ee) F 
=1 
A.) = & |-— 2S (4.9) 


@— Wo 


2n7* 
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may be shown to satisfy (4.7) if f(w) is continuous at wo (see Parzen [P7], p. 413). 
More generally, let K(w) be a function satisfying 


K@) = K(-«), K()>0, [ . Ke) dw = 1 (4.10) 


and let B, be a sequence of constants tending to © as n tends to ~. Then 
A,() = B,K(B,(@ — w»)) (4.11) 


may be shown to be a sequence of spectral windows satisfying (4.7) if f(w) is 
continuous at w, . It is thus natural to consider estimates of the form 


te) = Bf ; K(B@ — »))fr(d) ad, (4.12) 


for a suitable kernel K(w) and constant B. In the sequel we shall study how 
the properties of the estimate defined by (4.12) depend on the choice of the 
constant B and the spectral averaging kernel K(w). 


5. SomME SUGGESTED ESTIMATES AND THEIR CLASSIFICATION INTO TyPEs 


In order to specify an estimate of the form of (3.14), one must state the 
covariance averaging kernel [or, in the terminology of Blackman and Tukey 
({B7], p. 12), the lag window] k7(v). One can consider two 1nethods of generating 
kernels k7(v), which include as special cases most of the estimates which have 
been suggested by various authors. 

Let h(u) be a bounded, even, square integrable function, defined for all real u, 
such that | 1 — h(u) |/| u | is a bounded function of w. 

The first class of estimates f#(w) that one can consider are defined by (3.14), 
with k,(v) defined by 


kr(v) = i(;¢-) (5.1) 


where the M, are positive constants tending to 0 as T tends in © in such a 
way that M,/T tends to 0. We call these the estimates of algebraic type. 

The second class of estimates f#(w) that one can consider are defined by 
(3.14), with k7(v) given by 


kr) = h(Arze*'"') (5.2) 


where A, are positive constants tending to 0 as T tends to © in such a way 
that (log A,)/T tends to 0, and a@ is a positive constant. We call these the 
estimates of exponential type. 

Estimates of algebraic and exponential type were introduced, and their 
asymptotic properties extensively discussed, by Parzen [P4]. The notion of 
an estimate of exponential type is a generalization of the estimate suggested 
by Lomnicki and Zaremba [L1]. Most of the estimates that have been sug- 
gested by various authors are of algebraic type, and correspond to different 
choices of the kernel h(u). 

Bartlett’s estimate [B3] corresponds to 


hw) =1-|ul, |ulsi 
= 0, ju] >1. 
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The estimate whose use Tukey calls “hanning’’ ({B7], p. 98) corresponds to 
hu) = 31+ cosmu), [ul <li 
= 0, |u| >1. (5.4) 
The estimate whose use Tukey calls ‘‘hamming’’ ((B7], p. 98) corresponds to 
h(u) = 0.54 + 0.46 cos xu, lu| <1 


= 0, |u| >1. 













(5.5) 





An estimate generalizing (5.4) and (5.5) is obtained by choosing 


h(u) 


1 — 2a + 2a cos ru, lu| <1 
= 0, |u| >1 (5.6) 









for some constant a. 
An estimate usually called the truncated periodogram corresponds to 


h(u) = 1, lu| <1 
ae, tats 4, (5.7) 


Another estimate, usually associated with the name of Daniell, corresponds 
to the kernel 







_ sinu 


h(u) (5.8) 


In [P2], Parzen suggested kernels of the form 
h(u) 





1—|ul', |ul<1 







= 0, otherwise (5.9) 





for some constant gq > 1 to be determined. In a paper presented at the 1957 
I.M.S. Annual Meeting Parzen orally suggested the kernel 


hu) = 1-67 +6|u/?, lu| <4 
= 211 — |u|)’, $< |u| <i 
= 0, ot erwise. (5.10) 













Other possible choices of kernel are given in [P3] and [P4]. 


There are two properties possessed by some of the estimates f#(w) considered 
which should be noted. 


An estimate f#(w) is said to be of non-negative type if necessarily 
f#@) >0 forall w. (5.11) 


Not all of the estimates considered are of non-negative type. In order for an 
estimate (3.14) to satisfy (5.11) it is necessary and sufficient that the corre- 
sponding spectral window K 7(w) satisfy 


K,@) >0 forall wo. 





(5.12) 
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Estimates of algebraic type corresponding to the kernels (5.3), (5.8), and 
(5.10) are of non-negative type. The other kernels listed above do not yield 
estimates of non-negative type. Estimates of exponential type are never of 
non-negative type. 


An estimate f#(w) of the form of (3.14) is said to be of truncated type if there 
exists a real number m, < T such that 


k7v) = 0 for |v|>mr. (5.13) 


If there exists a smallest real number m, satisfying (5.13), we call it the truncation 
point of the estimate. The advantage of using an estimate of truncated type 
is that it may involve less computation, since not all the sample covariances 
R,(v) have to be computed in order to form the estimate. 

A kernel h(u) satisfying 


h(tu)>0O for |u| <1 
hu) =0 for |u| >1 (5.14) 
gives rise to algebraic estimates with truncation point 


Mr = Mr (5.15) 


and to exponential estimates with truncation point 


sty -! log Ar. (5.16) 


Except for (5.8), all the kernels h(u) listed above satisfy (5.14). 

The quantity mr , defined by (5.15) and (5.16) respectively for estimates 
of algebraic and exponential type, will be called the truncation point of the 
estimate even if the estimate is not of truncated type. It will turn out that the 
statistical properties of estimates are best expressed in terms of the truncation 
point. 

We assume that the kernel h(u) satisfies 


|1—htu)|<h,|ul* forall u (5.17) 


for some exponent g > 0 and constant h, . The largest real number q such that 
the kernel h(u) satisfies (5.17) for some finite h, is called the characteristic ex- 
ponent of the kernel h(u). 

The constant h, satisfying (5:17) will usually be the limit defined by 


_ pn 1 — Aw) 
h, = lim “Ter u |" (5.18) 


In [P2] the characteristic exponent q was defined as the largest real number 
(assumed to exist) such that the limit in (5.18) exists and is non-zero. The 
corresponding value of h, is called the characteristic coefficient (see Table I). 
A kernel h(u) is said to be a normalized kernel if its characteristic coefficient 
h, is equal to 1, or alternately if its power series expansion (for positive wu near 
on may be written h(u) = 1 — u“ + - 
If one desires to restrict oneself to nents which lead to necessarily non- 
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negative estimates f#(w), then g can be at most 2; in symbols, if 


h(u) = [ e'“H(w) deo (5.19) 
where H(w) > 0, H(—w) = H(w), f°. H(w) do = 1, f°. wH(s) < ©, then 
[1-—hA@|<h|ul’, (5.20) 
gwd i ” H(w) deo. (5.21) 


It will turn out that, on the basis of the various criteria treated here, from 
a practical point of view one does not lose much in restricting oneself to kernels 
of characteristic exponent g = 2. 
















6. VARIANCE AND BANDWIDTH 


An estimate of algebraic type may be written either as a sample covariance 
average 


f#@) = x F COS vw n(=2-)Reto dv (6.1¢) 
1 <S +f? 
= De tlie cos ww h (=7-)ees0) (6.1d) 


or as a sample spectral average 


fe) = Mr [ H*(MsQ — o))fe0) a; (6.2) 


(6.2) holds both in the discrete and continuous parameter cases, if in the dis- 
crete parameter case the periodogram f,;(A) is defined on the whole real line 
as an even function with period 27. An alternate way in which f#(w) may be 
written in the discrete parameter case is 


o 


ftw) = [ : Ofr(\) Mr Dy H*(Mr( — wo — 2nn)). (6.3) 


It may be shown that if X(¢) is a normal time series with spectral density 


function f(w), then for an estimate f#(w) of algebraic type it holds approximately 
that* 


Var [f#(4)] = F* f()GQ){1 + 60, «)} (6.4) 
where we define 
G(h) = [ " Mee (6.5) 


and it is assumed that in the neighborhood of w, the true spectral density func- 
tion f(w) varies more slowly than does the spectral window M,H(M,(w — )). 





* As pointed out by Lomnicki and Zaremba ((L2], p. 134), in the discrete parameter case, 
(6.4) should contain a factor {1 + 4(0, x)} on the right hand side. 
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If f#(w) is of exponential type it may be shown that (6.4) holds if one interprets 
M , as the truncation point defined by (5.16), and replaces G(h) by 2. It should 
be noted that the truncated periodogram [the estimate corresponding to the 
kernel h(u) given by (5.7)] is an estimate both of algebraic type and of expo- 
nential type, and is completely characterized by its truncation point M, . 
It may be shown that formulas for variance (and bandwidth) of an exponential 
estimate do not depend on the choice of kernel h(u), and indeed exactly cor- 
respond to the formulas obtained for the truncated periodogram treated as 
an estimate of algebraic type. 

In view of the dependence of the variance of an estimate f#(w) on its trunca- 
tion point, it is also striking to note the dependence on truncation point of 
its bandwidth. The bandwidth @[f#(w)] of an estimate f#(w) is defined as the 
bandwidth of its corresponding spectral window. For an estimate of algebraic 
type, 


BIfF@)] = aaa (6.6) 


It may be shown that for an estimate of exponential type, 6[f#(w)] = /mr . 
One sees that for a given sample size 7' and choice of kernel h(u), the cor- 
responding estimate f#(w), whether of algebraic or exponential type, has the 
property that truncation point and bandwidth are inversely proportional, 
while variance and truncation point are directly proportional. Consequently, 
variance and bandwidth are inversely proportional. 
From (6.4) and (6.6) we obtain the fundamental fact that 


defining 


The values of G,(h) are given in Table I for various kernels h,(u). 

From (6.7) one sees that the product of bandwidth and variance is a constant. 
It is also remarkable that (6.7) holds for all frequencies, including w = 0 (and 
w = 7 in the discrete parameter case) since at these frequencies the variance 
is twice and the bandwidth is half the values given by (6.4) and (6.6) respectively 
(if in (6.4) one ignores the factor in braces). 

One may regard f#(w) as being an estimate of a spectral average correspond- 
ing to a spectral window with peak frequency w and bandwidth A[f#(w)]. For 
fixed bandwidth one sees from (6.7) that the variance of the estimate is directly 
proportional to G,(h). Consequently one prefers a kernel h(u) for which the 
value of the coefficient G,(h) is least. From Table I one sees that kernel | is 
best, while kernel 8 is not too far behind. For reasons connected with mean 
square error criteria (see Parzen [P9] and [P2]) it seems preferable to choose 
a kernel with characteristic exponent q = 2. Consequently I prefer kernel 8 
[which in Table II is called kernel h;(u)]. 












MATHEMATICAL CONSIDERATIONS IN THE ESTIMATION OF SPECTRA 181 


From (6.4) one obtains yet another grounds for preferring the spectral window 
corresponding to h;(u). For a given choice of truncated kernel A(u) and trunca- 
tion point M, , the variance of the estimate is directly proportional to G(h). 
Consequently one prefers a kernel h(u) for which G(h) is smallest; from Table I 
one sees that kernel 8 again compares favorably. 













7. Design RELATIONS 
The signal to noise ratio of an estimate is defined by 







Elf#@)] 
SNR ft (@)] = olf#(w)] (7.1) 
In words, SNR [f#(w)] is the mean divided by the standard derivation [or the 
reciprocal of the coefficient of variation] of the random variable f#(w). The 
properties of the signal to noise ratio are discussed in Parzen ([P7], p. 378). 
For example, if f#(w) may be regarded as being approximately normally dis- 
tributed, then in order that 





















P| | eerie | < +] > ons, 
Pi a 3 ee 9 + | > 0.95, (7.2) 
E[f#@)] 
where 6 is a given number, one should have a signal to noise ratio satisfying 
SNR [f#(@)] > 2/6. (7.3) 


The notion of signal to noise ratio may also be expressed in terms of equiva- 
lent degrees of freedom. A random variable X, which has a x’ distribution 
with n degrees of freedom, has a signal to noise ratio 





SNR [X] = =" (2) (7.4) 
Therefore 
E[X]\ _ ? ‘ 
n = (4141 2M) = 2{SNR [X]}’. (7.5) 


For a (positive) random variable X, we may define the number defined by 
(7.5) as its equivalent degrees of freedom, denoted EDF_[X]."In particular, 


EDF [f#@)] = 2{SNR [f#@)]}’. (7.6) 
Now approximately 






Elfi@)] = f@). (7.7) 
From (6.7) and (7.7) one obtains the fundamental relation 









Var [f¥()] G oe 
Foy bt = 
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Essentially (7.8) is valid under the assumption that in the neighborhood of w 
the true spectral density function f(w) varies more slowly than does the spectral 
window M,H(M,(w — d)). Equivalently, one may write (7.8) in the form 


_ 2TBifF@)] _ _ 2T "oO 

Now (7.9) constitutes 2 equations in 4 parameters: 

T, the sample size 
M, , the truncation point 
8 = Bif*#(w)], the bandwidth of the spectral average of which f#(w) is an 
unbiased estimate. 
n = EDF [f#(#)], the number of equivalent degrees of freedom of the 
estimate. 
If any two of these parameters are given, then the other two are determined. 
In particular we write out design relations for the two cases of greatest practical 
interest: case 1, 8 and n given; case 2, T and n given. 

Case 1: One desires f#(w) to (i) be an estimate of a spectral average corre- 
sponding to a spectral window with bandwidth 6 and peak frequency w, and 
(ii) have a given number n of equivalent degrees of freedom (or a given signal 
to noise ratio n/2). Design relations: Choose a sample of length T' satisfying 


T = nG,(h)/28 (7.10) 


where G,(h) is the coefficient of the kernel h(u) used in forming the algebraic 
estimate f#(w). Choose the truncation point of the estimate to be 
1 2r = 
Mr = 3H@) = —————- (7.11) 
B / h(u) du 


The estimate 
= 


#@) = x a COS Dw i 57-)Ro@ (7.12d) 
then has n equivalent degrees of freedom, and may be regarded as an unbiased 
estimate of a spectral average with center frequency w and bandwidth 8. 

Case 2: One has available a sample of fixed length 7’, and desires estimates 
which are statistically stable in the sense that they possess a given number n 
of equivalent degrees of freedom. 

Design relations: To form an estimate which possesses a given number n 
of equivalent degrees of freedom one should choose 


2T 


= nG(h) 7.13) 


Mr 


The estimate f#(w) thus formed should be regarded as an estimate of a spectral 
average corresponding to a spectral window with bandwidth 


(7.14) 


b= 7G.) 
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8. A CoMPARISON OF NORMALIZED SPECTRAL WINDOWS 


If h(u) is a kernel with characteristic exponent qg and characteristic coefficient 
h, , define 
h*(u) = h(u/hi’*). (8.1) 


It is clear that h*(u) is a kernel with characteristic exponent q and characteristic 
coefficient h* = 1. We call the normalized kernel h*(u) the normalization of 
h(u). Its Fourier transform is written 


H*(w) = 7 [ 5 cos uw h*(tu) du. (8.2) 


It is easily verified that the normalization H*(w) of a spectral window H(w) 
is given by 


H*(w) = hi*H@hy’). (8.3) 

The coefficients G(h*) and G,(h*) are given by 
G(h*) = hi/G(h) (8.4) 
G,(h*) = G,(h). (8.5) 


In the preceding section it was seen that the coefficient G,(h) could be used 
to compare kernels A(u), that kernel being more efficient for which G,(h) is 
least. From (8.5) one sees that the coefficient @,(h) depends only on ‘the nor- 
malized kernel h*(u). 


H,(w) and #,(4) on an enlarged scale for w > 4.0 


Ficure 1—Normalized Spectral Windows. 
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It seems natural that to compare two kernels h,(u) and h.(u) one should 
normalize them to have the same shape at u = 0, which is equivalent to com- 
paring their normalized kernels (or lag windows) h*(u) and h*(u). It is even 
more instructive to compare the normalized spectral windows which are plotted 
in Figure 1. It appears that h;(u) yields a normalized spectral window which 
has the least troublesome side lobes. One prefers an estimate whose side lobes 
are smallest. High side lobes are unpleasant if the true spectral density function 
f(w) has a sharp peak, for then subsidiary smaller (and spurious) peaks will 
be produced in the estimate at all frequencies for which a side lobe coincides 
with the main peak. 

An alternative comparison: Let H(w) be a spectral window such that 


[ * hitihie §: ay i Bey (8.6) 


We have previously defined the bandwidth of the spectral window to be 
the reciprocal of its maximum value, usually equal to 1/H(0). This corresponds 
to regarding the spectral window H(w) as being equivalent to a rectangular 
spectral window which has the same area and maximum height. 

However, other possible definitions of bandwidth suggest themselves. In 
particular, suppose that the ‘‘variance” of the spectral window, defined by 


oH] = [ ‘ w?H() do (8.7) 


is finite. If one regards the spectral window H(w) as being equivalent to a rec- 
tangular spectral window which has unit area and the same variance, the band- 
width of the spectral window may be defined as ~/12 o[H]. Under this definition 
of bandwidth, the bandwidth of an estimate f#(w) of algebraic type is defined 
to be given by 


B[fF@)] = 12 off] (8.8) 


Mr 


/ wo MrH(Mra) dw = M720"[H]. 
With this definition of bandwidth all the normalized spectral windows sketched 
in Figure 1 have the same bandwidth as the rectangular spectral window H,(w). 

If the kernel h(u) has characteristic exponent 2, then by (5.21) its charac- 
teristic coefficient h, and its variance o°[H] are related by o[H] = 2h, . Con- 
sequently the bandwidth of the estimate may be written 


B[fF@)] = 24 ad a (8.9) 


From (8.9) one sees that all normalized kernels give rise to estimates with the 
same bandwidth. 

Instead of the fundamental relation (7.8) one now obtains the fundamental 
relation (for an estimate of algebraic type corresponding to a kernel h(u) with 
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characteristic exponent 2) 


Var [7#00)] grrer.y) — G20) 


defining 
Gilt) = {24h,}'? [Wo dw. 
Equivalently, one may write 


EDF [f#(.)] = 7A 8 (8.12) 


The coefficient G,(h) plays the same role with regard to our new definition 
of bandwidth as the coefficient G,(h) plays with regard to our previous defi- 
nition. It should be noted that 


G.(h) = G.(h*) = V24 G(h*) (8.12) 


so that the coefficient G.(h), like G,(h), depends only on the normalized kernel. 
Some values for G,(h) are given in Table II. One sees that h,(u), rather than 
h;(u), has the smallest value of G,(h). 

The design relations (7.10) and (7.14) now hold with G,(h) replaced by G,(h). 
In (7.11), one should replace 1/H(0) by {24 h,}*. It is interesting that with 
the definition of bandwidth given by (8.9) one requires a larger sample size T 
and a larger truncation point M ; [in order to achieve an estimate of a spectral 
average with a given bandwidth 8 and equivalent number n of degrees of freedom] 
than one does if bandwidth is defined by (6.6). 


9. Figures oF Merit BASED ON MEAN SQuaRE ERROR 


In order to choose an estimate f#(w) it would seem that one must adopt 
some figure of merit for estimates. A number of figures of merit may be con- 
sidered (see [P4], p. 308), all based on the notion of mean square error. 

If 6 is a parameter to be estimated, and 6* is an estimate, then the mean 
square error of 6* as an estimate of @ may be written 


E | o* — @|* = Var [67] + b°(6*) (9.1) 
where Var [6*] = E | 6* — E[6*] |? is the variance and 
b(6*) = E[e*] — 6 (9.2) 
is the bias of the estimate. 
The spectral density function f(w) is not a single parameter but is a function 


(or curve). Consequently at least three different criteria suggest themselves. 
One may judge estimates on the basis of (i) their mean square error 


E|| f#@) — f@) |") = Var [f#@)] + OTF) (9.3) 


at individual frequencies, (ii) their mean square integrated error 


[BU Pte) - 16) Fido = [Var Hol do + [FIR do, 0.4 





OST AIO YO ‘0 
n 


9A > eh. 7 QAN"Y nus (n)*y 





I<|”| ‘0 
I>|"|>¢ “in| — Dz 


{enn mir OA /)4 > |"! ‘ln|o+ .n9 


(gg = eae =a) as * I<|n| 


ee ¥ een (x/nzy"4 1S|n| ‘(nesoo + 1) 





& 
N 
& 
< 
a 
a 
> 
z 





T<|”| 0O= 


G 


Te) 1L'F {1200 ~ se 1 (n)'y 1518) ‘a= 1- 
(4)"p (0)*H/T (")«¥ (n)y 


smopurM yospvedg pozypULLo Ny 
II #14av I, 








MATHEMATICAL CONSIDERATIONS IN THE ESTIMATION OF SPECTRA 


or (iii) their mean square maximum error 
Efsup | f#@) — f@) |", 


an upper bound for which is provided by the inequality 


E*"fsup | f#@) — fe ll < E'"[sup | f#@) — Elf#@)] |") 


+ sup | Elf#@®] —f@)|. (9.6) 


Some authors ({(G4], [P2]) have sought to judge estimates on the basis of 
their mean square error 7'[f#(w)] at individual frequencies. Lomnicki and 
Zaremba [L1] have stated that one ought to judge estimates on the basis of 
their integrated mean square error, defined by (9.4). While this criterion may 
be suitable if one is interested in a goodness of fit test [Z1] for spectral density 
functions, it does not seem to be an accurate reflection of our aims when our 
interest is in estimating the spectral density function. It seems to me that in 
estimating the spectral density function f(w), one desires an estimate f#(w) 
which minimizes the mean square maximum error, defined by (9.5). Given an 
estimate for which (9.5) is known, one could form confidence bands for the 
spectral density function using various forms of Chebyshev’s inequality. In 
particular, for any « > 0, 


Pisup | f(s) — fle) | > < a Efoup | fee) - fo) PJ. 7) 

The three figures of merit defined by the right hand sides of (9.3), (9.4), 
and (9.6) are the sum of two terms which may be said to respectively represent 
the contributions arising from variance and bias. In a later paper [P9] we show 
how one may obtain upper bounds for these quantities based on certain a priori 
information about the time series which must be either known or estimated. 
Using these upper bounds one may design an empirical spectral analysis as to 
obtain estimates which achieve a preassigned figure of merit. 

We conclude this paper by discussing what we mean by bias. 

The problem of obtaining expressions for bias of an estimate of the spectral 
density function is not entirely well defined. In order to evaluate the bias of 
an estimate f#(w) one must clearly state what is the quantity being estimated. 
In my work ({P1]-{P4]) I have regarded f#(w) as being an estimate of the value 
f(w) of the spectral density function at the frequency w. Tukey ([T3], p. 402) 
on the other hand considers it unrealistic to attempt to estimate f(w); rather 
he seems to imply that bias is always zero, since given any estimate f#(w) one 
should adopt as the quantity being estimated its mean E[f#(w)]. I now take 
the view that while one may not be interested in estimating f(w), one is in- 
terested in estimating a spectral average J(A) corresponding to a spectral 
window which is very peaked around the frequency w and has bandwidth of 
the order of the reciprocal of the sample size. The bias is then E[f#(w)] — J(A). 

I claim further that it is possible to choose a typical spectral average J(A) 
of which f#(w) may be regarded as the estimate. Consider the mean of the 
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periodogram: 
Etf) = x em(1 i eRe) dv (9.8) 
= [_ TW Tw — rf) ar 
in which 
Wu) = - (22) i ! [ a+-~Seed: (9.9) 


From (9.8) one sees that E[f;(w)] is a spectral average corresponding to a spec- 
tral window of bandwidth 1/7’. We have seen in section 3 that it is natural to 
be interested in spectral averages over windows whose bandwidth is inversely 
proportional to the sample size. As a representative of all such spectral averages 
one may take the mean periodogram E[f;(w)]. Consequently we define the bias 
of the estimate f#(w) defined by (3.14) to be 


b[f#@)] = Elf#@)] — Effr@)] 
> 
=5 ‘ etl — ke@\E Rr] av. (9.10) 
It should be remarked that using (9.19) one arrives at exactly the same 


conclusions as if one defines the bias to be E[f#(w)] — f(w), since it may be 
shown that for all 7 


T sup | Elfr@)] — f@)| < 12 | R(u) | du a. u? | Rw) | anu} . (9.11) 
If k-(v) = h(v/M,z), and h(u) satisfies (5.17), then 
[12 = br 11 RO [dvs nnye [lol |R@ ad. 0.12) 


Consequently for an estimate of algebraic type 


sup | b[f#@)] | < (Mr) *h,S(q) (9.13) 
where S(q) is an upper bound satisfying 
= | ” | u'R(u) | du < S(g) (9.14) 
2S Jie 5 : ; 


From (9.13), one sees that bias and truncation point are inversely proportional, 
while variance and truncation point are directly proportional. It would thus 
appear that the best estimate f#(w) of the spectral density function is that 
estimate whose bandwidth most effectively balances bias against variance so 
as to minimize mean square error. Nevertheless, the simple design relations 
given in section 7 (which ignore the question of bias) appear to be adequate 
for designing practical spectral analyses as long as one keeps clearly in mind 
that the estimates obtained are estimates of spectral averages corresponding 
to spectral windows whose bandwidth is of the order of the reciprocal of the 
truncation point rather than the reciprocal of the sample size. 
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Discussion, Emphasizing the Connection Between 
Analysis of Variance and Spectrum Analysis* 


Joun W. TuKry 
Princeton University 


and 
Bell Telephone Laboratories 


INTRODUCTION 


This session was to be expository and to be directed to statisticians. Accord- 
ingly, the discussants have a responsibility to provide such comments as may 
tend to make both the two papers and the general subject more understandable 
to statisticians, particularly by relating spectrum analysis to statistical tech- 
niques, and to fields of application, more widely familiar to them. Fortunately, 
the connection between spectrum analysis and those aspects of the analysis 
of variance which emphasize variance components is extremely close. 

To make this connection evident, however, we shall have to analyze the 
implications and foundations of our procedures and thinking in classical analysis 
of variance more deeply than usual. It is fair to say that the spectrum analysis 
of a single time series is just a branch of variance component analysis, but 
only if one describes its main difference from the classical branches as a re- 
quirement for explicit recognition of what is being done and why. In classical 
(i.e. single-response analysis-of-variance) variance component analysis, one can 
(and most of us do) analyze data quite freely and understandingly with little 
thought about what is being done and why it is being done. This is, perhaps 
unfortunately, not the case for the time series analysis branch of variance 
component analysis. 


I: VARIANCE COMPONENTS AND SPECTRUM ANALYSIS 


When variance components? 


When conducting analyses of data in conventional analysis-of-variance pat- 
terns, we sometimes pay attention to individual values of main effects, inter- 
actions, and the like. At other times, we pay attention to estimates of variance 
components. The controlling factor in this choice is the character of the sets 
of data which would be considered to be other realizations of the same experi- 
ment (or of the same patterned observation). Thus, if we were comparing the 
times taken by the five outstanding runners of the world to run 1500 meters, 
another realization of the experiment would reasonably involve the same runners, 
and it would be appropriate to pay attention to individual main effects. If, 
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however, we were considering the speeds for a standard assembly operation, 
as shown by five assemblers drawn at random from a pool of 250 assemblers 
in a large factory, another realization of the same experiment would almost 
certainly involve a different group of assemblers. Consequently, in analyzing 
such data, we would pay attention to the estimated variance component for 
assemblers, since our concern would have been with assemblers as a whole 
rather than with five particular assemblers. (We are here concerned with the 
direct issue of what aspect of the classification concerned receives attention, 
not with the indirect, but perhaps equally important, issue of how the character 
of this classification affects the proper error term for other main effects—the 
question sometimes discussed in terms of “fixed, mixed, or random models’’.) 
There is a clear analog to this choice in the Fourier-oriented analysis of time 
series. 

Let us first consider the case of a function of time which is periodic with 
known period. If we may choose the time unit for convenience, the period may 
as well be 2x, and the function will then have (in practice) a Fourier series 
representation of the form 


y(t) = do + te (a; cos jt + b; sin jt) 


Let us lay aside for the moment questions of errors of measurement, numbers 
of (and spacings between) times at which observations are made, and whether j 
has a finite or infinite range. Since we are statisticians, concerned with a sta- 
tistical problem, the coefficients a> , a; , b; , a2 , b2 , --+ are not to be thought 
of as constant, but rather as having some joint distribution. This joint dis- 
tribution reflects the functions corresponding to “‘all the realizations” of the 
same experiment or observational program. At one extreme, the functions of 
time representing different realizations might all be very nearly the same. If 
this is the case, then, given a single realization, it is clearly appropriate to 
concentrate our attention upon the estimated values of do , a; , b: , d2 , bo , - 
This is, of course, the situation envisaged in classical harmonic analysis. One 
opposite extreme, one which you may claim only a statistician would think 
of, occurs when there are parameters of , o; , 9; , -** and the a’s and b’s are 
independent normal deviates with ave a, = ave a; = ave b; = 0, var a = 9, 
var a; = var b; = o;/2. Given one realization of such an experiment, it is only 
reasonable to look at quadratic functions of the observations, and to regard 
them as telling us about of , of , 02 , --- . Specifically it is appropriate to look 
at aj, ai + b} , af + 3, --- and at certain linear combinations of these quan- 
tities. In contrast to classical harmonic analysis, this sort of periodic-time- 
function problem is a variance component problem. 

The model which lies behind the classical tests of significance in harmonic 
analysis, a line of development finally completed by Fisher [1929], is an incom- 
plete mixture of the two we have just described, in which 


Yovservea(t) _ Ytixea(t) ~ Yrandom(E). 


In this decomposition the ‘‘fixed’’ component is usually thought of as involving 
only one, two, or perhaps three values of j, while, both most importantly and 
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most dangerously, the “random” component is thought of as having of = 
PPS Pee 

Equality of the o; , the analog for periodic functions of being a ‘‘white noise’’, 
is exactly what would hold if the “random” component consisted only of in- 
dependent (or merely uncorrelated) observational errors in observations equally 
spaced through (0, 27). It is also, unfortunately, exactly what is most unlikely 
to occur in practice (for reasons to be discussed in a moment). As a conse- 
quence, the practical applications of such “‘largest value against all the rest” 
tests of significance in harmonic analysis is, to say the least, extremely limited. 
(If only our estimates, a} + b; , of o; had more than two degrees of freedom, 
we could improve the classical tests of significance by fitting some sort of reason- 
able dependence of of upon j, before proceeding to the construction of a sig- 
nificance test. Even with only two degrees of freedom, some such improvement 
may be possible.) 

Thus, even in the case of periodic time functions, we have some situations 
which should be treated almost entirely in terms of means, others which should 
be treated entirely in terms of variance components, and still others where 
both descriptions should be used together. 


The character of time 


Time is connected. And functions of time reflect this fact in their structure, 
not only in the tendency toward continuity shown by individual time functions, 
but even more obviously in the associated probability structures. When a time 
function is wisely regarded as generated from constituents coming from dif- 
ferent sources, as most are, the individual constituents are not likely to be 
‘white noises.’’ (Not even the measurement error constituent!) And, even 
more crucially, the processes by which these constituents are combined are 
not likely to treat different frequencies alike, so that even if the constituents 
were white noises, their resultant would not be one. Both in the periodic case 
and the more usual and general case of a continuous spectrum, a random time 
function is rarely a ‘‘white' noise”’. 

Another characteristic of time is that it is quite frequently measured from 
an arbitrary origin. To be sure, if the simple periodic case has an annual period, 
we may place the computing origin of time where we will, but that will not 
make 1 January and 1 July the same. But if we are examining the harmonics 
of a 400-cycle electrical voltage, there is no equally necessary or special relation 
between local time and 400-cycle time. In a repetition of the same experiment, 
the generator phase at zero local time may well be equally likely to have any 
value between 0 and 27. And if this is so, the situation is a stationary one. (This 
example may help to emphasize that stationarity is a condition “across the 
ensemble’’, a condition relating one realization to another, a condition on a 
whole ensemble; that it is not a condition on single realizations, and most spe- 
cifically is not a condition of steadiness within individual realizations.) 

Finally, phenomena in time are rarely periodic. (In fact, when examined 
under a microscope, no known phenomenon is precisely periodic.) Consequently, 
an effective Fourier description of real phenomena can rarely be a periodic 





194 JOHN W. TUKEY 


description. We must allow all frequencies to contribute, and hence, as Jenkins 
has explained, must turn to a continuous spectrum. 

The statistically vital contrast between situations appropriately describable 
by means and situations appropriately describable by variances continues here, 
as we should have expected. The motion of a springboard from which a diver 
has just leaped requires all frequencies for their description. The motions fol- 
lowing successive leaps by a single careful and precise diver will be relatively 
similar. They will, as a whole, probably be most appropriately described ‘‘by 
means’’, by a description of the typical time history of board motion. But if 
no diver is present, if the springboard is vibrating through a very small ampli- 
tude because the wind is blowing on the board and its supports, and because 
the ground itself is vibrating because of vehicle traffic and factory machinery, 
the situation is likely to be quite different. The characteristics of this “‘noise- 
like’ motion of the springboard which are maintained from one realization to 
another are of the nature of variance components rather than means. And of 
course (as when a big grasshopper jumps off a small wind and traffic-vibrated 
springboard) there are intermediate situations whose description appropriately 
combines both means and variance components. 


Which variance components? 


Discussion has proceeded, up to this point, as though the statement of a 
problem automatically fixed a set of variance components. When we think 
matters over carefully, we find that this is far from being the case. In an ab- 
stract problem, where only the pattern of the observations and the symmetries 
of their distribution are specified, without any indication of their interpretation 
or understanding, there is no unique set of variance components. Instead there 
are many sets, each interconvertible by prescribable formulas into each other. 
Abstractly, the best we can do is to say that any set of quantities such that 
each of the second moments (pure and mixed) of the observations can be ex- 
pressed as a linear combination of the quantities of the set (together with, say, 
the square of the average of some general mean) can play the formal role of a 
system of variance components. (If the quantities in some set do not behave 
like variances we might prefer to call them (together with the squared average) 
second-moment components rather than variance components, though we shall 
not be concerned with this particular precision of language here.) Still one 
set of variance components may be more convenient, and far more useful, 

than another. Why? 


Replicated double classifications 


If we examine one of the most classical patterns, a replicated double classi- 
fication into rows and columns, we can learn why. Let us, then, consider a 
classical analysis of variance, based on a pattern involving d observations in 
each of the r-c cells formed by crossing r rows with c columns. The analysis 
of variance breakdown into sums of squares, degrees of freedom, and mean 
squares is standard, as are the definitions of variance components. The well- 
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known formulas for the average values of mean squares are, if all population 
sizes are infinite: 


ave {MS | rows} = o° + d-ore + de-on 
ave {MS | cols} = o? + d-okce + dr-ae 
ave {MS | int} = o + d-cre 
ave {MS | dup} = o” 


Why did we choose o”, o%¢ , ¢2 and o2 as the variance components in terms 
of which we are to write out such formulas? We could for example, have used 
as variance components such average values of differences between differently 
related pairs of observations as, taking i ~ I, j + J, k ¥ K: 


ave (Yin — Yiix) 
ave (Yiik — Yisx) 
ave (Yiix — Yrix) 
ave (Yijn — YraK)” 


Before trying to answer these questions we must look back at some of the 
implications of the way in which they were asked. 

The term “variance component”’ can be, and is, appropriately used in two 
different senses. These senses differ in effect, but only when the underlying 
situations differ, so that no contradictions arise. When the underlying situation 
is such that it is appropriate to consider means in the first instance (the pigeon- 
hole model of Cornfield and Tukey 1956 includes such extreme examples), 
variance components are means over more specific quadratic quantities. In 
particular, the within-cell or ‘‘duplication’”’ variance component a’ is the average 
of the variances of all the cell populations. If these cell-population variances 
differ from cell to cell, so too do the values of 


ave (Yijn — Yisx) 


since these averages will always be twice the variance of the population in the 
corresponding cell. 

When the underlying situation is at the other extreme, so that only variance 
components should be considered, then the labels upon the rows and columns 
can wisely be regarded as purely arbitrary. This means that if the same “‘in- 
dividual” were to appear as a row in each of two realizations of the same experi- 
ment, the numbers labeling the two rows would be quite unrelated. Such lack 
of relationship could be in the nature of the situation, or could have been en- 
forced by our insistence on a randomization of the row numbers, separately 
for each realization, before the data was made available for analysis. But if 
the labels are arbitrary, we cannot think of one cell, considered by itself, as 
different from another. Similarly, there will be only four kinds of pairs of cells: 
identical; in same column but not in same row; in same row but not in same 
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column; in different rows and columns. And the four corresponding average 
square differences would have the following values: 


ave (yin — Ysix) = 20° 
ave (Yiie — Yisx) = 20° + Re + 2oe 
ave (yis — Yrix) = 20° + 2oee + 2or 
ave (Yiiz — Yrs) = 20° + 2ore + 2oR + 2% 
Knowing either set of four quantities, either the 4 average squared differences, 
or o°, 02 , o2 , and ov , the other set.is very easily calculated. 

Why then do we prefer the first set, since they are arithmetically equivalent? 
It must be because of some matter of interpretation. And the interpretation 
must involve not the realizations of a single experiment but the comparison 
of two or more different experiments. In fact, we feel that, for example, the 
sort of change of circumstances which halves or doubles o2 while leaving o’, 
one , and of unaffected is easier to understand than the sort which changes 
ave (yj. — Yirx) Without affecting its three fellows. 

The prime criterion for selecting useful variance components is that we should 


be more easily able to understand the changes in the situation which would change 
some variance components while leaving others alone. 


Known-period time functions 


Let us now consider periodic time functions with a fixed period and a sta- 
tionary joint distribution. One variance component description has already 


been given in terms of of , oj , o3 , ::: . (Normality is a matter of indifference 
to us in the present instance.) Another can be given in terms of Jowett’s serial 
variance function [Jowett 1955): 


Vi = 3 ave (y(t + h) — y(0)” 


which, on account of stationarity, must be the same for all values of ¢. The 
formal relations between these two schemes is easily found to be: 

in (sin ih) 
The formal similarities between the two pairs of mutually related variance- 
component schemes, one for the replicated two-way table, and the other for 
stationary periodic time series, are very striking, but the actual similarities 
go deeper. 

What are the simplest changes which we can contemplate making in a situa- 
tion involving stationary periodic time functions? They are the results of such 
simple linear operations as the result of passing an electrical voltage through 
a simple circuit consisting of resistances, condensers, and inductances, or the 
result of passing a mechanical motion through a simple linkage of springs, 
masses, and dash pots. (Such processes occur, in particular, in almost every 
physical or chemical measuring instrument.) Any such linear process will affect 
the amplitude and phase of each harmonic in a characteristic way. If its effect 
on a pure jth harmonic would be to multiply amplitude by | ZL, |, then the 
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jth variance component of any stationary ensemble of periodic time series 
(with period 2x) will be multiplied by | L; |? = L;L* . There is no correspond- 
ingly simple result for the serial variance function. Consequently, the frequency- 
related variance components are much more useful than serial variance functions 
in dealing with stationary ensembles of fixed-period time functions. 

(In highly mathematical language, the frequency variance components are 
a basis for second moments which simultaneously diagonalize the effects of all 
operations that are linear and time-shift-invariant—all black boxes in the 
sense of pp. xyz-uvw.) 


It can be done with covariances! 


The discussion just given stressed the analogy between classical analysis of 
variance and the analysis of stationary periodic time series by using averages 
of squares of differences of observations in both situations. It would have been 
possible to have stressed the analogy almost equally to have used covariances 
in both situations. In the replicated row-by-column pattern, we have, when 
the covariances are taken across the specification, from one realization to another, 
WITH AN ENTIRE NEW SAMPLE OF ROWS AND COLUMNS IN 
EACH REALIZATION: 


COV {Yiiz » Ys} = 0 + onc torte, 
COV {Yyiie » Ysix} = Orc tor tac, 
COV {Yin » Yisx} = oR ’ 

COV {Yiix » Yrix} = oC , 


These covariances across the ensemble are quite analogous to the serial co- 
variances in the time series case, which are given by 


R(h) = cov {y(d), y(t + h)} 


where the covariance is again across the ensemble, from realization to realization, 
and whose relation to the frequency variance components is, formally, 


R(h) = 06 + 2 (cos jh)-9; . 


The main reason for approaching the analogy in terms of averages of squared 
differences is a pedagogical one. It seems to be easier to think about the averages 
of squared differences, when working from one realization to another. After 
all, as statisticians we are quite used to thinking about the average value of 
some quantity we have managed to measure only once. But it is a much further 
cry to think about a covariance of two quantities; each of which has been meas- 
ured only once. 

The qualitative nature of this distinction between covariances and average 
squared differences is notably different for the replicated double classification 
and for stationary ensembles of periodic time series. This is due, in large part, 
to our tendency to expect the versions of classifications to have names, to try 
to think in terms of situations where means and main effects are more important 
than variance components. We feel that if, for example, 7 is a subscript iden- 



















































































































198 JOHN W. TUKEY 


tifying persons, that i = 3 should refer to a particular person, not to the third 
row of some randomly arranged data array. 

Yet in a situation where a pure variance component approach is appropriate, 
the process of randomly rearranging the rows of the data array generates what 
we may think of, without doing too much violence to the situation, as a new 
(but clearly not independent) repetition of the experiment. If we fix our eyes 
on particular values of 7, j, k, J, J, and K, consider all admissible rearrange- 
ments of the data array, and then average the simplest quadrature expressions, 
we are led to suitable symmetric functions of the original data array which are 
natural estimates of the covariances across the ensemble, provided the latter 
are given an averaged interpretation. 

The usual practice in the spectrum analysis of a single stretch of time series 
is entirely analogous to such a procedure. Let us, for example, consider esti- 
mating cov (y; , ys). We have the original observations y; , Y2 , Ys » Ys» Yss °° 
The results of shifting the time origin, one unit at a time, and always dropping 
observations at negative times, are first y2 , Ys, Ys, Ys» Ye,°** , thenys , y%, 
Ys» Yo,Y75,°"* and so on. The pairs (y: ’ Ys), (Ye ’ Ys), (ys ’ Yo), oe (y ’ Yess) 
are ‘“‘equivalent’’ (either because stationarity is assumed or because we want 
an averaged covariance) and we can calculate a ‘‘sample’’ covariance from these 
pairs. Such processes of imitating the sought-for covariance across the ensemble 
with a sample ‘‘covariance’’ wandering around the data pattern are inevitable 
when only a single realization is available, be it in an analysis-of-variance 
situation or a time series situation. 

(In the time series situation, if and when we look more deeply into the details 
of the situation, we may find that the averages of squares of differences indeed, 
as Jowett has suggested [1955, 1957, 1958], have real advantages over co- 
variances, insofar as problems associated with trends and very low frequencies 
are concerned. But this is for the future to reveal.) 


Black boxes and the general case 


A discussion exactly analogous to the one just given for stationary ensembles 
of period-27 time series can be given for the general case of a stationary ensemble 
of time series. We shall not attempt to give details here, trying only to hit 
the high points. 

There are many circumstances under which it is convenient to call any pro- 
cedure or process (be it computational, physical, or conceptual) which converts 
an input to an output a black box. In dealing with time series it is convenient 
to restrict the term black box to procedures or processes which satisfy two 
further conditions: 


(1) The output corresponding to the superposition of two inputs is the super- 
position of the corresponding outputs. 

(2) The only effect of delaying an input by a fixed time is to delay the out- 
put by the same time. 


If the procedure or process departs from one or both of these conditions, it is 
conveniently called a colored box, using specific colors when specific sorts of 
departures are permitted. Some examples of black boxes include: 
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(a) moving averages, such as 


a = : {Ye-nsa + Yi-ns2 +++ + yr} 
(b) time delays 


(c) differences 







ay = FB; 





— His 


(d) more general moving linear combinations 

































Ze = AoYe H+ Yrs Hoes + Anyi 


(e) linear electric networks (which may include amplifiers, transmission 
lines, and wave guides), 

(f) linear mechanical systems, 

(g) linear economic systems, 

(h) differentiation with respect to time, 

(i) integration with respect to time. 


Clearly many of the most important computational, physical, and conceptual 
processes are black boxes in this sense. 

It is easy to show (if we grant a small amount of continuity and a sufficient 
lack of dependence of present output on what happened at t = —~) that, 
if the input to a black box is A-cos (wt + 6), then the outputs has to take the 
form G(w)-A-cos (wt + 6 + ¢(w)), where the amplification G(w) and the phase 
shift g(w) depend only upon w. This brings every black box into the framework 
discussed by Jenkins, so that 


(spectrum of output) = [G()]’- (spectrum of input). 


The important thing about this relation, for our present purposes, is that the 
variance component associated with a single frequency (or narrow band of 
frequencies) in the output is determined by the corresponding variance com- 
ponent of the input. There is no mixing up of frequency variance components. 
This is simultaneously true for all black boxes, and is the basic reason why the 
user, be he physicist, economist, or epidemiologist, almost invariably finds 
frequency variance components the most satisfactory choice for any time 
series problem which should be treated in terms of variance components. 


If: OrHER ANALOGIES 





I hope that Part 1 has made the close relationship between spectrum analysis 
of a single time series and variance component analysis very much clearer. 
There are similar analogies to other classical techniques. These are worthy of 
mention here, even though we cannot take the space to describe them in detail. 

Even though the cross-spectrum analysis of two or more time series was 
not discussed in this session (in part because an understanding of the spectrum 
analysis of one time series is an essential prerequisite), it is important to point 
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out that probably the most important aspects of cross-spectrum analysis are 
cases of (complex-valued, frequency-dependent) regression analysis in which 
the analog of a regression coefficient is the ratio of a (complex-valued) cross- 
spectrum density to a spectrum density, and is estimated by the corresponding 
ratio of estimates of averaged densities. (This fact will not surprise those who 
recall that a simple regression coefficient is estimated as the ratio of a sample 
covariance to a sample variance, or that a structural regression coefficient is 
sometimes estimated as the ratio of a sample covariance component to a sample 
variance component.) In studying time series, as in its more classical situations, 
regression analysis, whenever there is a suitable regression variable, is a more 
sensitive and powerful form of analysis than variance component analysis. 
As a consequence, one major reason for learning about spectrum analysis is 
as a foundation for learning about cross-spectrum analysis. 

The other approaches to data associated, directly or indirectly, with the 
analysis of variance and the name of R. A. Fisher also have their analogs in 
the analysis of time series. We have already noted, for example, how classical 
harmonic analysis is the appropriate approach to known-period time functions 
when the over-all situation is such that one should look at means rather than 
at variances. 

In dealing with the mean-like behavior of nonperiodic time functions from 
a Fourier point of view, a natural and effective approach is furnished by complex 
demodulation in which the given stretch of data {X;} is first converted into 
two stretches of (real) values, viz. 


{X; coswot} and {X; sin wot} 


which can usefully be regarded as the real and (+ or —) imaginary parts of 
one or the other of the complex stretches of data 


ta" or sn. 
The second step is to smooth the two real-valued stretches, smoothing both 
in the same way. The simplest smoothing process is the formation of equally- 
weighted “moving averages’, but it is often desirable to use weights which 
taper down at each end appropriately. The final step is to display the result 
in various ways, including: 
(1) Plotting individual stretches of smoothed values against time. 
(2) Plotting corresponding smoothed values against one another, using time 
as a parameter. 
(3) Plotting against time the phase or the magnitude of the complex number 
whose real and imaginary parts are the corresponding smoothed values. 


The interpretation of such plots is usually guided by an understanding of 
what happens if a particular single frequency or band of frequencies are promi- 
nent in the original data. If the origi::al data were simply X; = A cos (wt + ¢), 
then the values of the two modulation-product stretches would be 

X; COS wot = 3A cos [Ww — wo)t + ¢] + 3A cos [(w + w)t + ¢] 
X; sin wot = —¥$A sin [Ww — wot + ge] + 4A Sin [(@ + w)t + ¢] 
and the result of smoothing these would be to nearly eliminate both terms if 
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w was not near w, , and to nearly eliminate the terms in (w + wo)t + ¢ if w is 
near wo . The results of smoothing, then, would, if w is near wo , be close to 


[3A -Gw — a)] cos [w — w)é + ¢] 













































and 
[3A -Gw — w)] sin [w — oo) + ¢] 

where G(w — wo) is the magnitude of the transfer function of the smoothing 
process (which we have assumed to use symmetrical weights and thus not to 
affect phase). In this simple case, a cosinusoidal variation of angular frequency 
w in the original, which may have been quite effectively concealed by larger 
contributions at other frequencies, has been demodulated, and appears as a 
cosinusoidal variation at the very much reduced angular frequency w — w , 
which is likely to be much more evident to the eye. (Complex demodulation, 
the calculation and smoothing of two stretches of modulation-products, is neces- 
sary if we are to distinguish the results of demodulating cos (wo + 4)t from 
the results of demodulating cos (wo — 4)t.) 

This technique is the natural extension to the non-periodic case of the ideas 
underlying the classical Buys-Ballot table [e.g. Stumpff 1937, pp. 132ff or 
Burkhardt 1904, pp. 678-679], the so-called secondary analysis, and Bartels’s 
summation dial [Chapman and Bartels 1940, pp. 593-599 or Bartels 1935 pp. 
30-31]. It has to be tried out on actual data before its incisiveness and power 
is adequately appreciated. 

Problems involving the simultaneous behavior of more than two time series 
have not been worked on in a wide variety of fields of application, but enough 
has been done to point the way and suggest the possibilities. There will be an 
increasing number of instances where the corresponding non-time-series problems 
would be naturally approached by multiple regression. These can be effectively 
approached by multiple cross-spectrum and spectrum techniques which will 
be precise analogs of multiple regression in spirit and, if care is taken in choice, 
in the algebraic form of their basic equations. The differences which will arise 
in the development will stem from: 


(1) the fact that regression goes on separately at each frequency (which 
produces merely an extensive parallelism of results), and 

(2) the fact that regression coefficients will now take complex values rather 
than real values (which enables us to learn a little bit more about the 
underlying situation). 


To my knowledge the multiple-time-series analogs of discriminant functions 
and canonical variates have not yet arisen in practice. But there would seem 
to be no difficulty in analogizing either or both. 
Ill: Parsimony AND Error TERMS 


Parsimony 





It appears to be natural to try to set up statistical problems in such a way 
that the numerical values of only a few characteristics, each easily estimated 
from the observations, suffice to complete the fixing of a probability model 
for the situation. And it appears all too natural to feel that such presuppositions 
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as normality or constancy of variance are important, since, if they failed to 
hold, the whole situation would not be completely fixed by the values of those 
characteristics which are easily estimated. But, for all such naturalness, the 
working statistician knows that it is often useful to estimate the mean of a 
population whose variance is unknown, and, similarly that it is often useful 
to estimate the variance of a population that is non-normal (frequently without 
trying to assess the nature and amount of its non-normality). For characteristics 
to be usefully estimated, it is not necessary that their values complete a pre- 
cisely stated model, although it is frequently the case that results about de- 
signing an experiment are only precise in such simple situations. Thus the 
famous telephone query, “I’m going to do an experiment, how many sheep 
should I use?” cannot be answered when all else that is known is that the ex- 
perimenter wants to compare the means of two treatments to a precision of 
+1.5 pounds of body weight, or that he wants to assess a simple variance to 
+10% of itself. In the first of these instances, precise design would require 
a precise variance of observation. In the second, precise design would require 
precise knowledge of distributional shape. Yet experiments can be and are, 
wisely, if not optimally, designed and validly analyzed in the absence of such 
precise information. 

Insofar as normality is needed only (i) to ensure that knowledge of the spec- 
trum would leave nothing else to learn, or (ii) to ensure that pre-experimental 
assessments of variability are precise, and these are the only reasons why Jenkins 
is concerned with normality, normality is not of great practical importance 
in spectrum analysis. 

(It is fortunate that normality is moderately closely approximated to in 
certain applications, since there are further branches of time series analysis, 
for example those dealing with numbers of upcrosses or numbers of maxima, 
for which normality is of crucial importance. Sequences of zeros and ones 
represent one ultimate expression of non-normality. In some instance, such 
sequences are usefully studied by spectrum anaiysis, in others they are not. 
The difference has to do with which aspects of their behavior is important.) 

Indeed there is a very general principle of data analysis upon which all exami- 
ners of main effects (in analyses of variance) lean, whether they know it or not. 
This can be boldly stated as the Principle of Parsimony, viz. IT MAY PAY 
NOT TO TRY TO DESCRIBE IN THE ANALYSIS THE COMPLEXITIES 
THAT ARE REALLY PRESENT IN THE SITUATION. Every time that 
one pays attention to main effects alone, whether because they are so much 
larger than interactions, or because the interactions cannot be estimated with 
sufficient precision, or for almost any other reason, one is behaving in accord 
with this principle. Thus this principle is widely, though usually implicitly, 
adopted. The same principle applies to the quadratic analysis of time series, 
to spectrum analysis and its relatives, not just in a single way, but in some 
three or four separate and distinct ways: 


Normality 


The first application is to the need, or lack of need, for estimation to a complete 
specification, for either assuming normality or estimating more complex matters 
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than the spectrum. In most practical situations this need is nonexistent. Knowl- 
edge about the spectrum of a probably non-normal ensemble of time-functions 
can be useful, just as knowledge about the mean of a population of imprecisely 
known variance can be useful. (In either case, once the data has been gathered, 
consistency of repetition is the appropriate basis for judging the stability of 
the result, not assumptions about normality or known variance.) 


Stationarity 


The second application of the general principle is to the assumption of station- 
arity, the analog in time series situations to the assumption of constancy of 
variance in more classical situations. The assumption of stationarity is one 
at which the innocent boggle, sometimes even to the extent of failing to learn 
what the data would tell them if asked. Yet I have yet to meet anyone experienced 
in the analysis of time series data (Gwilym Jenkins is an outstanding example) 
who is over-concerned with stationarity. All of us give some thought to both 
possible and likely deviations from stationarity in planning how to collect or 
work up data, but no one of us will allow the possibility of nonstationarity to 
keep us from making estimates of an average spectrum, any more than working 
analysis-of-variance statisticians will refrain from estimating a variance com- 
ponent because the variability thus assessed may well have to be an average. 

The fact that the spectrum is changing with time (or elevation, or azimuth) 
need not make it unwise to estimate one, or several, average spectra. The de- 
tection of waves 1 millimeter high, 1 kilometer long, with a 10,000 kilometer 
fetch (Munk and Snodgrass 1957) was based upon estimates of spectra averaged 
over four-hour periods. The crucial point in identifying the length of fetch 
was the rate of change of the center frequency of this distinctive, but very 
small peak, from one four-hour period to another. Once we admit that we are 
estimating an average spectrum, we have admitted that there may well be 
other relevant characteristics of the situation beyond the spectrum, that esti- 
mation is not completing specification. Such an admission, as this example 
shows, is a good thing rather than a bad one. 

There seems to be extra reluctance to consider an average spectrum. It is 
hard to be sure of the principal reasons for this, but a well-founded desire for 
replication as a basis of security is likely to be one. If only one time series is 
available for analysis, as is far too often the case in so many economic instances, 
it is comforting to believe that, somehow, stationarity makes it possible to have 
“replication’”’ from one time period of another. The truth is not so comforting. 
Stationarity is frequently absent. Even when stationarity holds, something 
like “replication” can only occur within the limits of a single stretch of moderate 
length if the true spectrum is devoid of detailed features (is sufficiently smooth 
in the small). And it is surely not wise to trust in “replication” that may not 
be there. 

Harry Press notes (private communication) that average spectra may hide 
an important departure from stationarity. In an entirely similar way, the use 
of analysis of variance on the results of an experiment comparing 12 treatments 
in randomized blocks may hide a substantial dependence of variability upon 
treatment, or a substantial dependence of treatment effect upon block. These 
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things can, and do happen. The possibility of their occurrence must be carefully 
kept in mind. But this fact is not relevant to the point we have just been dis- 
cussing. 

Surely, if one has both adequate data and scientific or insightful ground to 
fear nonstationarity, it will be wise not to average spectra over too long a time. 
But the urge to choose the averaging time wisely is strengthened by an under- 
standing that all data analyses estimate average spectra. 


Wisely-chosen resolution 


The third application of the general principle is to the question of the nar- 
rowness of the frequency ranges for which we should seek spectrum estimates. 
There are infinitely many frequencies. The number of separate frequencies 
over which we could seek estimates from a given body of data is limited by 
the extent of the data, and grows without limit as longer and longer pieces of 
data become available. But it does not follow that we should always, or even 
usually, work close to this limit. The analogy with an interaction mean square 
in a row-by-column table is close and persuasive. There are r-c individual 
estimates of the interaction mean square, each based on just one of the residuals 
which remain after fitting rows and columns, each involving just one degree 
of freedom. How often does it pay us to calculate and compare all these sepa- 
rate estimates? Only very rarely. (It is often useful to calculate and compare 
a few estimates of an interaction mean square, each based on a reasonable 
portion of the available degrees of freedom.) The position with spectrum esti- 
mates is analogous and similar; to be effective we must estimate averages over 
well-selected frequency ranges. (This is in addition to the averaging over time 
necessitated by lack of perfect stationarity.) In both instances, interaction 
mean square and spectral estimate, it does not pay to try to estimate too much 
detail, even if the detail is really there. 


Proper error terms 


The question of the proper error term is a classic of the analysis of variance, 
often relied upon to separate the men from the boys and the pastry cooks. 
It is well recognized that, for example, the plot-to-plot error of an agricultural 
experiment is almost certain to be too small, specifically because it rules out 
place-to-place and year-to-year components of variation. It is not too great 
a stretch to consider this question, which arises for time series in an only slightly 
different form, a fourth example of the general principle of parsimony. For 
while it will not be costly to estimate plot-to-plot variance, it is likely to be 
costly to trust it, to use such estimates as error estimates. Even its estimation 
may be costly, in the agricultural situation, if the result is to expend too much 
effort on choosing the optimum plot size, on doing one’s best to reduce what 
may be a minor source of variation. As Jenkins points out at the very end of 
his paper, it is not uncommon for spectrum estimates based upon different 
experimental repetitions to differ more than might be expected from their 
internal behavior. (Statisticians familiar with any of a wide variety of other 
situations would be surprised if this were not so, if external error were not larger 
than internal error.) As a consequence, it is not likely to be worthwhile to expend 
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too much effort in using estimates whose windows have optimum widths and 
optimum detailed shapes, since this may mean exerting a large effort to mini- 
mize a minor component of variability. 

One way to describe matters is in terms of alternative ensembles. In each 
repetition of the experiment, the time series which is actually realized is drawn 
from a different ensemble (from a different population each element of which 
is a whole time series). Such a description is entirely analogous to a description 
of an agricultural experiment in which each local comparison of two treatments 
is drawn from a population, but the populations for different ‘“‘places’’ or ‘‘years”’ 
differ. The fact that matters may be appropriately described in such a way 
often affects what we wish to estimate. If an average comparison, in the agri- 
cultural situation, depends upon the “‘place’’ in a way, or for reasons, that we 
do not understand, we are usually driven to estimate, not average responses 
at individual places, but rather average responses for all places. (These are 
the natural ‘‘main effects.) There are situations, however, as for example 
when studying a cheaper substitute to see if it causes occasional deleterious 
effects, where we may need, because of variation from place to place, to esti- 
mate the value of the least favorable average response and, perhaps, the fre- 
quency with which similarly unfavorable situations will arise in more extended 
practice. The situation with time series is exactly similar. 

Most of the time we shall be driven to estimation of a spectrum averaged 
over repetitions, where the pattern, or the causes, of the changes in spectrum 
from repetition to repetition are not understood. This averaging over repeti- 
tions, foreed on us by alternate ensembles, is superposed upon the averaging 
over time within repetition, partially forced upon us by nonstationarity, and 
upon the averaging over frequency bands, forced upon us by the limited extent 
and amount of our data. What we estimate, then, is an average of averages 
of averages. We have come a long way from the idea of a tight specification- 
estimation relationship, where everything which is not presupposed should be 
estimated. But it is well that we have done so. And no one who has considered 
carefully what is estimated by a main effect in a reasonably complex analysis 
of variance can maintain that so much averaging is surprising or unusual. 

Just as in more conventional areas of statistical application, there are situa- 
tions, the comparison of vibration intensity with structural strength being 
perhaps the most obvious, where we shall need to estimate not the average 
spectrum but some upper limit, perhaps an upper 99% limit, for the spectra 
in the various replications, for the spectra of the various alternative ensembles. 
But such instances are the exception, not the rule. 


Effects upon balance between stability and resolution. 


In any case, the presence of true differences between repetitions, of differences 
between the spectra of the alternative ensembles, will surely force a readjust- 
ment of the balance between stability and resolution. The main reason for 
estimating average spectral densities over relatively broad frequency bands 
is to assure moderate stability of estimate. If variation within ensembles should 
be small compared to variation between ensembles, such within-ensemble sta- 
bility is of little value to us. Thus we can afford, in such circumstances, to 
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improve our frequency resolution by estimating spectral densities averaged 
over narrower bands. (There will still remain a natural limitation on resolution, 
however, associated with the limited duration of the individual ensembles.) 


IV: SpeciaL ProBLeMs OF TIME SERIES 
Resolution 


The notion of resolution, as applied in optics and other branches of physics, 
is a well-recognized and useful physical concept. It does not have any single 
definition in numerical terms, and it is well that it does not. For the general 
idea that “higher resolution” means ‘‘capable of detecting more detail’’ is 
clear, while any one way of making it quantitative would not be universally 
satisfactory. (If you like, “resolution” is not ‘unidimensional’. But whether 
you like this fact or not, it would be unwise to make it unidimensional by a 
fiat of definition.) Jenkins and Parzen have introduced us to a number of defi- 
nitions of bandwidth. There are, and will be, other such definitions. The 
value of any of them lies in what the values of the variously defined band- 
widths tell us about “resolution”. No one definition, nor even all the defi- 
nitions so far given, can tell us all about resolution. As Goodman pointed out 
in his verbal discussion, such matters as ‘‘rejection slope in db/octave away 
from the major lobe” or “db of rejection at a particular frequency’’ can be 
important in particular circumstances. Thus numerical values of bandwidths 
according to any definition closely related to “resolution’’ can help us, but 
they will help us most if we regard them as telling us part, not all, of the story. 


Choice of resolution 


There is one matter upon which I should not like to have my views mis- 
understood: the desirability in exploratory work of making spectral analyses 
of the same data with different resolutions (usually represented in packaged 
systems of calculation of spectrum analysis by the use of varying numbers 
of lags in the initial computing step, which is the calculation of sums of lagged 
products). Let me be quite clear that, in my judgment and according to my 
experience, it definitely is very often desirable in exploratory work, and sometimes 
essential, to make analyses of the same data at differing resolutions. Moreover, 
it may be equally important to use different window shapes and different pre- 
whitenings. 

The place where Jenkins and I differ seriously, at least verbally (and I suspect 
the difference is more verbal than actual) is in the utility of examining some 
sequence of mean lagged products as a firm basis for choosing the number 
of such values to be inserted in an appropriate Fourier transformer, and trans- 
formed into spectral estimates. Our difference is greater still in connection 
with the adequacy of the point of apparent “‘damping down’’ of these values 
as a basis for choosing this number. It is not that knowledge of the ‘damping 
down”’ lag is not useful, but rather that, at least in my view, its unthinking 
use may be dangerous. 

On the one hand, I have known of cases where the useful estimates of power 
spectra came from stopping well short of the damping-down point. On the 
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other hand, if the spectrum were to contain one very large, very broad, very 
smooth peak, and a close group of small, narrow peaks, the mean lagged products 
would appear to damp down at a lag associated with the width of the large 
broad peak, so that a spectrum whose resolution was associated with this 
damping-down point would fail to resolve the close group of small peaks. Here, 
as in all sorts of data analysis, there is no substitute for careful thought com- 
bined with trial of various alternatives. 

It is natural to be tempted into calculating more spectrum estimates than 
the number of mean lagged products used as their basis. This temptation need 
not be a dangerous one, once it is realized that, given the mean lagged products 
and the shape of the window, all the possible spectrum estimates lie on a cosine 
polynomial of degree equal to the number of lags used. Once the usual number 
of spectrum estimates have been calculated, they are enough to determine 
this polynomial, and the calculation of further estimates is equivalent to a 
process of cosine-polynomial interpolation. This does not mean that calculating 
more estimates is useless, or that the results of further calculation will lie close 
to the results of straight-line interpolation between the points already calculated. 
But it does mean that the additional estimates provide no new information, 
only more detailed exposition of information already present. And it means 
that drawing smooth freehand curves through the original spectral estimates 
is often much more useful than connecting them by segments of straight lines. 











































Blurred estimands 





In discussing the general principle of parsimony we emphasized the need 
to estimate averages over bands of frequencies. This point is so central to spec- 
tral analysis as to make its heuristic and intuitive understanding worth con- 
siderable effort. Let us begin with classical situations. If one has more degrees 
of freedom than variance components, then one can find estimates of some 
(and perhaps all) of these variance components whose average values do not 
depend upon the other variance components. But once there are more variance 
components than degrees of freedom, this need not be the case. Consider a 
two-way r-by-c array of observations in which there are r-c + 2 variance com- 
ponents, viz. a rows variance component, a columns variance component, and 
one variance component for each of the r-c cells. (This is a natural model when 
the variance of the cell contributions varies irregularly from cell to cell.) In 
this situation there is no estimate of any of the r-c cell variance components 
whose average value is free of all the other variance components. 

In the time series case there are very many more variance components than 
degrees of freedom. For, unless some periodicity assumption holds perfectly 
(and I know of not a single instance where it does), a contribution of the form 


A cos wt + B sin wi 


is permissible for any value of w in some interval. And as al! statisticians know 
from bitter experience, at least all the things that are permissible will happen. 
Thus, in principle, there are infinitely many variance components, one for 
each possible w. And, when the realities of band-limiting and of finite duration 
of data are faced, there are only a finite number of observations available, and 
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hence only: a finite number of degrees of freedom. There is no hope of esti- 
mating all variance components here, even by using impractically unstable 
estimates. 


Bracketting undesired effects 


Let us return, for the moment, to a situation with a finite number of variance 
components, only four of which will enter our discussion. Let us suppose that 
we are interested in estimating a particular one of these variance components, 
o; , and that our choice has narrowed down to three quadratic functions of 
the observations, whose average values are 


ave {A} = o; + 0.040; — 0.0203 + 0.01lo% 
ave {B} a, + 0.0602 + 0.0403 + 0.020% 
ave {C} = o; — 0.080; — 0.050; — 0.0307 


So long as we insist on using only a single quadratic function of the observations, 
the choice of A, whose average value is least affected by o; , 03 , and o{ has 
a real advantage. But if we were willing to look at two quadratic functions 
of the observations together, then B and C are a more effective choice, at least 
so far as average values go. For, on the average, one is raised by the other 
variance components, while the other is lowered. If, for example, the observa- 
tions are replicated m times, so that there are m A’s, m B’s, and m C’s, and 
so that, consequently, 


B + ts3/Vm 


is an upper confidence limit for ave B, while 


C — ts-/Vm 


is a lower confidence limit for ave C, then the interval 


(C — tsc/V'm, B + ts,/Vm) 


is a confidence interval for o? , without regard for the values of o? , «3 , and o; . 
(No such confidence interval can be based upon the m values of A.) When- 
ever we cannot get estimates (of what we want to estimate) whose average 
values are wholly free of what we do not want to estimate, the use of such 
paired estimates, one underestimating and the other overestimating, is likely 
to be useful and, perhaps, even necessary. 

When we make estimates of spectrum densities, the window which relates 
the average value of our estimate to the spectrum is (for the apparently in- 
escapable case of equally-spaced data) inevitably a cosine polynomial (of degree 
no larger than the index of the longest lag used). It can vanish at only a finite 
number of points. Consequently its main lobe, which points out the band of 
frequencies over which we seek to estimate some average spectrum density, 
is inevitably accompanied by minor lobes which allow leakage from the parts 
of the spectrum outside the desired band to affect the average value of our 
estimate, and hence to affect its individual values. Even if we are willing to 
accept the blurring due to averaging within the major lobe, as we must, like 
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it or not, we are rightly reluctant to face unknown possibilities of leakage from 
other parts of the spectrum. The cure is the same as for the example with four 
variance components: use two estimates. (This time one estimate should have 
all minor lobes negative while the other has all minor lobes positive.) This 
general situation is discussed more fully elsewhere [Tukey 1961 (?)], and it 
is to be hoped that some suitable pairs of estimates will soon be explicitly 
available. (For one pair see Wonnacott 1961.) 


Kinds of asymptosis 


The purpose of asymptotic theory in statistics is simple: to provide usable 
approximations before passage to the limit. Consequently asymptotic results 
and asymptotic problems are likely to be of limited utility when the finiteness 
of a sample size or of some other quantity is of overwhelming importance. 
(Thus, for example, the theorem that maximum likelihood estimates are asymp- 
totically normally distributed with a certain variance-covariance matrix is 
rarely of any use when there are only 1 or 2 degrees of freedom for error.) It 
is sometimes hard, but almost always important, to remember this fact. 

Time series analysis follows its usual pattern, “like most statistical areas, 
only more so!’”’, insofar as asymptosis is concerned. For there are three distinct 
ways in which time series data could tend toward a simplifying limit: 


(1) The total extent of all the stretches of data available could become 
more nearly infinite. 

(2) The extent of each individual stretch of data could become more nearly 
infinite. 

(3) The bandwidth of the measurement could become more nearly infinite 
(requiring a more nearly vanishing interval between times of necnecing). 


The consequences of these three, which are quite distinct, depend upon whether 
the resolution of the estimates to be made (a) remains constant, (b) increases 
as fast as the total extent, extent, or bandwidth of the data, or (c) behaves in an 
intermediate manner. 

If (1) occurs without (2) or (3), the possible resolution does not increase, 
so that (a) is the only relevant situation. The stability of individual estimates 
of (averaged) spectrum density then increases essentially proportionally to 
the total extent of data. 

If (2) proceeds, (1) must also. If (2) and (1) proceed without (3), the range 
of (aliassed) frequencies to be considered will not change, so that a constant 
number of estimates corresponds to constant resolution, and to an increase 
in stability essentially proportional to total extent of data. If, on the other 
hand, the resolution is increased proportionally to the total extent of data, 
the stability of individual estimates will remain constant. 

If (3) proceeds without (1) or (2), we may make estimates over a wider and 
wider frequency range, but we cannot obtain higher and higher resolution. 
For constant resolution, we obtain constant stability. 

In practice, where there are several repetitions, several stretches of data, 
it may be that we can wisely treat the total extent of all data stretches asymp- 
totically (especially when the additional variability in external error should 
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be considered), but I know of no single practical instance where an asymptotic 
treatment of either stretch length or band-limitation gives useful results. 

The limitation on ultimate resolution due to limited extent of data stretches, 
and the limitation on frequency ranges for which estimates can be made due 
to band-limiting, always seem to behave like small-sample phenomena, and 
must be faced in detail. They do not at all behave like large-sample phenomena, 
where everything can be ‘‘smoothed out’’ and treated in a limiting, continuous 
way. 


V: THe Morar 


To analyze time series effectively we must do the same as in any other area 
of statistical technique: ‘‘Fear the Lord and Shame the Devil’’ by admitting that: 


(1) The complexity of the situation we study is greater than the complexity 
of that description of it offered by our estimates. 

(2) Balancing of one ill against another in choosing the way data is either 

_ to be gathered or to be initially analyzed always requires knowledge 

of quantities which cannot be merely hypothesized, and which, in many 
cases, we cannot usefully estimate from a single body of data, such 
as ratios of (detailed) variance components or extents of non-normality. 
Theoretical optimizations based upon specific values of such quantities 
may be useful guides, but only when the failure of past experience (and 
the present data) to give precise values for these quantities is recognized 
and allowed for. 

(3) There is no substitute for some sort of repetition as a basis for assessing 
stability of estimates and establishing confidence limits. 

(4) Asymptotic theory must be a tool, and not a master. 


The only difference is that one must be far more conscious of these accept- 
ances in time series analysis than in most other statistical areas. 

In a single sentence, the moral is: ADMIT THAT COMPLEXITY ALWAYS 
INCREASES, FIRST FROM THE MODEL YOU FIT TO THE DATA, 
THENCE TO THE MODEL YOU USE TO THINK AND PLAN ABOUT 
THE EXPERIMENT AND ITS ANALYSIS, AND THENCE TO THE 
TRUE SITUATION. 


VI: THree Mysteries 


Up to this point, we have been concerned with the fundamentals of time 
series analysis and with the close and cogent analogies between time series 
analysis and other areas of statistics. As a consequence our remarks have related 
most closely to the first of the two papers. It is now time to turn to the second 
paper, which grapples with some of the more detailed aspects of time series 
analysis. Here it seems best to try to shed light on a few of the aspects which 
are likely to seem most mysterious. Our attention will be given to the mysterious 
importance of dividing sums of lagged products by n rather than by n — f, 
to the mystery of how new window patterns are sought, and to the mysterious 
importance of choosing a window. 
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Does the divisor matter? 













































The major computational effort, as measured in millions of multiplications 
or minutes of machine time, of any conventional careful spectral analysis is 
expended on the calculation of the sums of lagged products 


> =; > X:Xiae 


(k) t=1 






(If these are calculated for k = 0,1, 2, --- ,m, some (m + 1)n — m(m — 1)/2 ~ 
m-n multiplications will be required.) The X; in this calculation will be raw, 
or prewhitened, or otherwise modified observations, from which means, fitted 
polynomials, or other fitted trends may or may not have been subtracted. 
Unless unusually careful preparatory steps for the elimination of very low 
frequencies were already taken in the preparation of the X; , the next step 
after calculating these sums of lagged products will be adjustment of these 
sums of lagged products for means or trends. It is vital to deal in practice with 
such adjusted sums of lagged products, as almost everyone who enters upon 
time series analysis seems to have to learn for himself. (However, it will save 
space and, hopefully, promote clarity if we omit the word “adjusted” during 
the remainder of this discussion. We shall omit it.) Having been told of sums 
of lagged products, every analyst of variance expects us to go on to mean lagged 
products. Going on is inevitable. 

There is a question of the appropriate divisor. If we had not corrected for 
the mean (or any trend) there are cases to be made for both n and n — k. If 
we had corrected for, say, a general linear trend (which absorbs 2 degrees of 
freedom), there are cases to be made for n, for n — 2, forn — kand forn — k — 2. 
Parzen gives attention, between his (4.6) and (4.7), to some of the reasons for 
choosing n or n — 2 rather than n — k or n — k — 2. By analogy with the 
analysis of variance we might feel that n — k — 2 (or, when no adjustment 
is made, n — k) would be desirable because unbiasedness is good. The un- 
biasedness argument is found not to be a strong one in the time series situation. 

Is this choice an important one for the analyst or investigator whose concern 
is with the spectrum? You should be happy to be told that the answer is “‘no’’. 
If one’s concern is with the spectrum, then the most important thing about 
any quadratic function of the observations is the spectrum window which 
expresses the average value of the estimate in terms of the spectrum of the 
ensemble. (The next most important thing is, of course, the variability of the 
quadratic function.) This is just what we should expect for a variance-com- 
ponent problem, where means and other linear combinations of the observations 
are without direct interest. For if, in some very complex (probably unbalanced 
to begin with, and then peppered with missing plots) analysis of variance, 
one is given the values of certain mean squares (or other quadratic functions 
of the observations), the first question one concerned with variance components 
asks is ‘‘How are the average values of these mean squares expressible in terms 
of our variance components?’’. (The question about stability ‘How many 
degrees of freedom should be assigned to each?” is important but secondary.) 
If we know the windows associated with our spectrum estimates, we need not 
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be concerned, in the first instance, with how these estimates were obtained. 
And, moreover, any linear combination of the results of dividing the sums of 
lagged products by n is alsq a linear combination of the results of dividing 
the sums of lagged products by n — k, and vice versa. 

The practicing spectrum analyst need not be concerned with division by 
n or n — k, so long as he doesn’t mis-assemble formulas by combining some 
which are appropriate for one divisor with others appropriate for the other. 

However, those interested in the theory of spectrum analysis do need to give 
some attention to this choice, partly because of the reasons given by Parzen, 
partly because this choice affects just what functions of frequency the mean 
lagged products are Fourier transforms of, partly for various other reasons. 
The man who has a practical interest in the autocovariance function, if there 
really be such, clearly also has to take an interest in alternative estimates. 

Unlikely though it may seem at first, there is a moderately close analogy 
between the biased estimates supported by Parzen and biased estimates which 
are reasonable in classical analysis of variance. Consider data in a single classi- 
fication with r observations in each class, so that the between mean square 
has average value o° + ro: , where o° is the error variance component, and 
o; is the between variance component. If we wish to estimate the population 
average corresponding to a particular classification, there is little doubt that 
the sample mean for that classification is the most reasonable estimate. But 
if we wish to depict the pattern of the population averages corresponding to 
all classifications, we should do something about the inflation of this pattern 
by error variance; we should replace the pattern of observed means by a suitably 
shrunken pattern. (In the simplest cases it may suffice to shrink each classifi- 
cation mean toward the grand mean by the factor [ro?/(o” + ro?)]*. In others 
the method developed by Eddington for dealing with stellar statistics [Trumpler 
and Weaver 1953, pp. 101-104] may need to be applied.) The analogy with 
the time series case is reasonably, in fact surprisingly, close. If we wanted to 
estimate just one autocovariance, we should undoubtedly use the unbiased 
estimate. But if we are concerned with the pattern made by the estimated 
values, with the nature of the autocovariance function, we may, as Parzen 
points out, do better to use the biased estimate. 

(The extreme instance of the problem underlying this choice in the time 
series case arises when one 5-minute record is ‘“‘cross-correlated”’ [really cross- 
covarianced] with another 5-minute stretch of the same time series, as recorded 
an hour, a day, or a week later. If the spectrum of the ensemble is relatively 
sharp, the average value of the covariance will still tend to zero, but the average 
value of its square will tend, not to zero, but to a value depending upon the 
product of the 5-minute duration with the width of the spectyal peak. Thus 
if one calculates autocovariances at lags from 24 hours 0 minutes to 25 hours 
5 minutes one will almost certainly find an apparently systematic wavy pattern 
in the unbiased estimates of autocovariances or autocorrelations computed 
for a particular realization. It is natural to believe that this pattern is ‘‘real’’, 
although the true average values of the autocovariances are actually very, 
very much smaller in magnitude than the values found from a single realization. 
Such patterns can be so regular as to mislead investigators into an unwarranted 
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belief that the presence of a strikingly accurate underlying clock has been 
demonstrated.) 


How can I construct a window? 


If we leave aside a few matters which really do not matter here, although 
some of them are very important elsewhere (such as adjustment for the mean, 
other devices for rejection of very low frequencies, and division by n — k not n), 
the function of lag by which the mean lagged products are multiplied before 
Fourier transformation, and the window (expressed in terms of w — w. and 
w + wo separately, where w, is the center frequency of the estimate) through 
which the spectrum determines the average value of the estimate, are Fourier 
transforms of one another. (If you have never followed a derivation of this, 
just take it on faith.) Since every lag must be a multiple of the data interval, 
one of these functions is a finite array of spikes, spaced one data interval apart. 
The other function is a polynomial in cos (w — wo) of an appropriate degree. 

While the discreteness of time is generally an important aspect of the data, 
it is not important for our present purposes, so that we may replace the spiky 
lag window by a smooth function of a continuous variable without altering 
its Fourier transform in any way which is essential to the present discussion. 
(Provided that we began with, say, at least 10-20 spikes.) Since we are going 
to calculate mean lagged products for only a finite number of lags, this continuous 
lag window must vanish outside a finite interval. If it were possible, we would 
like to have its Fourier transform, the corresponding spectrum window, also 
vanish outside a finite interval, for then the average value of the corresponding 
spectrum estimate would only involve contributions from a restricted part of 
the spectrum. 

It is, however, well known that a function and its Fourier transform ‘cannot 
both vanish outside finite intervals. Indeed, they cannot both go to zero too 
rapidly as their arguments tend to infinity. The standard example of a function 
which, together with its Fourier transform, goes to zero rapidly at infinity is 
the standard normal density function, which together with its Fourier trans- 
form, goes to zero as the negative exponential of half the square of its argument. 
Unfortunately, we cannot make use of the normal density as a lag window, 
because it does not vanish outside a finite interval. 

Every statistician knows, however (or so the phrase goes), how to approxi- 
mate a normal distribution by a bounded distribution. It is only necessary to 
consider the distribution of means of simple random samples from any bounded 
parent distribution. And what parent distribution could be simpler then the 
rectangular (uniform) distribution? If we take samples of size k, the Fourier 
transform of the distribution of means will be of the form (sin u/u)*, where 
u is a multiple of w — wy , depending upon k and the number of lags used. The 
larger is k, the smaller are the minor lobes of this window in comparison with 
the main lobe, and the more lags are required to give a main lobe of prescribed 
narrowness. If k = 1, which corresponds to a raw Fourier transform of the 
mean lagged products, the minor lobes adjacent to the main lobe are about 4 
the height of the main lobe (and negative), which proves to be impractical. 
If k = 2, which corresponds to line 1 in Parzen’s Table 1, and minor lobes are 
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at most % the height of the main lobe, and the resulting spectral window, often 
called the Bartlett window, is everywhere positive. If k = 4, which corresponds 
to line 8 in Parzen’s Table 1, and to h,(u) in his Table 2, the minor lobes are 
at most sr the height of the main lobe, and the resulting spectral window, as 
Parzen shows, is quite effective. 

It would be perfectly possible to use k = 8 or k = 16 if we wished even lower 
lower minor lobes. The cost to us of doing this would be twofold. There would 
have to be an increase in computational effort in order to provide mean lagged 
products for the additional lags required to give a main lobe of comparable 
width. And the shapes of the main lobes would be somewhat less favorable, 
since the process of raising the window to a higher and higher power will make 
both the minor lobes and the lower portions of the main lobe still lower. As 
a result the main lobe will “‘occupy”’ a smaller and smaller part of the frequency 
band between the zeroes (of the window) which define it, and, consequently, 
the variability of the corresponding estimate (leakage aside) will be greater 
than that of an estimate with a more “‘blocky”’ spectrum window. 

As is clear from Parzen’s paper, these are not the only useful lag windows, 
the ‘‘cosine-arch” or “hanning’’ lag window which is proportional to ‘one 
plus cosine”’ being also of practical interest. This latter window was “discovered” 
by empirical observation, and the best reason for considering it are the properties 
it is found to have. 

(Two further easily understandable types of window which may sometimes 
prove useful may be obtained respectively, (i) by taking a truncated normal 
distribution as the lag window, (ii) by taking a Cebysér polynomial for the 
spectral window. This last choice makes all minor lobes of equal height, and 
as small in comparison with the main lobe as is possible for a given number 
of lags. This equality of height, which makes the minor lobes adjacent to the 
main lobes lower than those of most other windows but makes minor lobes 
far away from the main lobes relatively higher than those of most other windows, 
seems to prove to be a disadvantage rather more often than it proves to be an 
advantage.) 


How important is window choice? 


We have discussed window carpentry briefly. Now we need to ask what does 
it buy us, how much better can we do with a specially constructed window 
than with a rather routine one. This question has opposite answers, depending 
on whether one relies upon his window to do everything for him, or not. 

If one relies solely upon windows, faces a peaky or steeply slanting spectrum, 
and is concerned with the behavior of the spectrum where the density is notice- 
ably below its highest values, then the quality of workmanship and polish of 
the window used can easily be of the utmost importance. (During the early 
’50s I spent considerable effort on a variety of ways to improve windows. The 
results have never been published because it turned out, as will shortly be 
explained, to be easier to avoid the necessity for their use.) 

If one applies his windows, actually or effectively, not necessarily to the 
original data but, whenever useful, to the results of simple linear modifications 
of the original data, chosen so as to depress peaks, to raise valleys, and, where 
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necessary, to remove narrow peaks (which may appear to be “‘lines’’), he will 
rarely, if ever, find any need for anything beyond a window of routinely good 
quality, such as the hanning or cosine arch window (or, if a slight increase 
in variance of estimate and a substantial increase in computational effort are 
worth bearing, the k = 4 window described above). (For discussion of tech- 
niques of linear modification see Blackman and Tukey 1959, Holloway 1958, 
and, perhaps, the work of the Labroustes referred to by Chapman and Bartels 
11940, p. 992] and Blackman and Tukey [1959, p. 180]). In my own experience 
this sort of approach to the problem, which corresponds [Blackman and Tukey 
1959, p. 42] to using different window shapes in different frequency bands, is 
much easier than seeking out explicit forms for very special windows to meet 
each special situation. Moreover [e.g. Blackman and Tukey 1959, pp. 62-63, 
Tukey 1959, pp. 315-316], consideration of this technique leads to very helpful 
insights into how the data is best gathered in the first place. 

But each of us is entitled to do his calculations as he pleases, so long as he 
does adjust his techniques to provide the amounts of precision and stringency 
his problems require. 


VII: CompuTATIONAL CONSIDERATIONS 


It is important to say something about the role of computational efficiency 
and computational choices as a considerations in time series analysis. Com- 
putational considerations are particularly important in time series analysis, 


in part because of the relatively large amounts of data processed, in part because 
of the very many multiplications involved in obtaining sums of lagged products, 
and in part for more subtle reasons. And it is sometimes hard, especially for 
the novice, to separate computational, statistical, and aims-and-purposes con- 
siderations, one from another. Yet if they are not separated, neither sound 
practices nor sound advice can be understood as such, rather than being taken 
on faith. 

Computational considerations depend very much on the equipment available. 
Crude spectral analysis is possible with paper and pencil [Blackman and Tukey 
1959, pp. 151-159], and modestly refined computations have been done on 
hand calculators. The beginning of effective spectrum calculation probably 
involves the use of punched-card tabulators to obtain sums of lagged products 
(by applying progressive digiting to cards obtained by off-set reproduction 
{Hartley 1946]) and the conduct of all further computation on hand calculators. 
The steps from this to fully automatized spectral analysis on machines of the 
capacity and speed of an IBM 7090 or CDC 1604 are many and long. The 
reluctance or eagerness with which one faces another hundred thousand multi- 
plications depends very strikingly on the equipment available. 

And, consequently, so does one’s attitude toward using many more lags to 
improve window shape or increase resolution, or toward recomputing mean 
lagged products whenever new spectrum estimates (estimates differing in re- 
solution, in window shape, in prewhitening, or in rejection filtration) are to be 
obtained from the same data. In the economy of abundance which goes with 
modern electronic computers, I prefer to recompute mean lagged products 
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when a new set of spectrum estimates are required, but others feel quite dif- 
ferently. Some of the reasons for this difference can be made manifest, and 
their mention may serve to illuminate a variety of computational issues. 


To recompute or not to recompute? 


First, recomputation when necessary allows the use of packaged, unified 
machine programs, which require only values for a few constants and the data 
in order to provide the desired spectrum estimates. This makes it much easier 
for those unsophisticated in time series analysis, whether investigators or tech- 
nical aides, to process data more easily and effectively. Most data analysis is 
going to be done by the unsophisticated. As statisticians we have a responsibility 
to package as many techniques as possible for safe and effective use by those 
who will analyze data, and who will not understand why the choices in the 
package were made wisely or unwisely. 

Next, and perhaps more important for the present, is the absence of adequate 
facilities for data analysis. There is no data-analytic language analogous to 
FORTRAN or ALGOL, in whose terms it is easy to describe the operations of 
data-analysis, and, what if far more crucial, I know of no large machine instal- 
lation whose operations are adapted to the basic step-by-step character of 
most data analysis, in which most answers coming out of the machine will, 
after human consideration, return to the machine for further processing. Neither 
programming languages or computer center operations are adapted to step- 
wise operation, and all of us who use big machines for data analysis are thus 
forced to more unified operation than might otherwise be desirable. 

Third, and this consideration is not related or restricted to big machines, 
stepwise computation tends to produce stepwise thinking. I believe that step- 
wise thinking led to the classical Schuster periodogram, and hence to decades 
of ineffective quiescence for frequency-oriented analysis of time series. The 
individual steps from data through intermediate results to periodogram ordinates 
seemed reasonable each by itself. And while Stumpff’s book recognized the 
nature of the corresponding spectral window before 1940 [Stumpff 1937, pp. 
98-100], nothing was done to provide more useful estimates until people began 
to relate average values of estimates to the spectrum of the ensemble of which 
the data is one realization. What security we can have in today’s frequency- 
oriented time-series analysis comes from over-all thinking, while many of the 
most threatening dangers come from step-by-step thinking. Thus we often 
do very much better to apply over-all processes (which have been thought 
through overall, not merely stepwise) to data than to apply the individual 
steps separately. This view does not deny the great desirability of “‘try, look, 
and try something a little different” as the typical pattern of data analysis. 
It merely asks that each trial, unless it is extremely exploratory, be thought 
through as a unit. It does not even say that it is unwise to calculate sums of 
lagged products once and for all. It only calls on those who do so to be sure 
that the total processes they apply to data have been thought through as wholes. 
It does, however, note that using preplanned packages increases the chances 
that such thinking will have been done. 
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Precision may matter 


Finally, there is a question of required precision of arithmetic. Let us approach 
this somewhat indirectly. In friendly conversation, James Durbin recently 
brought firmly to my attention that there was an alternative to first prewhitening 
the observations and then calculating sums of lagged products for these modified 
values, remarking that one might, instead calculate rather more sums of lagged 
products for the original observations, and then calculate the suitable simple 
linear combinations of these sums which would be identically equal to the sums 
of lagged products for the modified observations. This remark is surely well 
taken. The results are algebraically identical. And if spectrum estimates for 
the results of enough different prewhitenings of the same data are going to 
be required, then the computational path suggested by Durbin will surely have 
real advantages. But it behooves us equally to consider the possible disad- 
vantages of this alternate approach. Perhaps the greatest of these is the likely 
requirement of greater precision of arithmetic, (although it is interesting to 
note that, if only one set of spectrum estimates is to be calculated, prewhitening 
first will even save some multiplications). 

This statement about accuracy sounds a little peculiar at first to one familiar 
with more classical statistical computations, but when he recalls the advantages 
of postponing divisions in calculating sums of squares of deviations (and in 
more general analysis-of-variance computations) he becomes aware of the 
practical inequivalence of algebraically identical forms of computation. 

An adequately prewhitened time series, at least one that is a realization from 
an ensemble which produces spectrum estimates which are even a quarter as 
variable as those provided by a Gaussian ensemble (most ensembles arising 
in practice will produce estimates more variable than those of a Gaussian en- 
semble), requires the observations to be recorded to, at most, only the precision 
offered by 1.5 to 2 decimal digits [Tukey 1959, pp. 319-320]. But one that is 
far from adequately prewhitened may require several decimal digits. This happens 
because the spread between the maximum and minimum observations is deter- 
mined by the (areas of) peaks in the spectrum, while the precision necessary 
to avoid serious loss of information about the spectrum is determined by the 
depths of its valleys. 

A similar difficulty can arise in so simple a situation as fitting a quadratic 
polynomial, though there most statisticians would see the difficulty coming 
and evade it. Thus if 


y; = 12.71 + 1,000,000z,; + 0.03(23 — 1/3) + «; 
where x; ranges from —1 to +1, var e; = 10~° and we seek to find the quadratic 
terms by ordinary quadratic regression, it will not suffice to use y-values with 
only 7-decimal digits of precision, because rounding to units introduces devia- 
tions of up to 0.50 (which is large compared to the maximum quadratic effect 
of +0.02) and increases the effective error variance by a factor of more than 8000. 
Similarly, in the time series case, if one is not prepared to prewhiten first, 
when desirable, it is necessary to make provision for moderate to high precision 
in input data, and correspondingly higher precision in accumulating sums of 
lagged products. The most likely result is a program which computes sums of 
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lagged products in double-precision arithmetic, perhaps even floating-point 
double-precision arithmetic. This means extra effort at many stages of the 
computation. 

No one of these four considerations rule out calculating sums of lagged pro- 
ducts once and for all, but each exerts pressure. The combined effect influences 
me very much, but I must admit that they might not be as potent if the calcu- 
lations with which I was concerned were to be made on quite other computing 
equipment. 


VIII: OrHEeR INTRODUCTORY REFERENCES 


Where is the statistician to seek further enlightenment about spectral analysis? 
It is hard to give extensive lists of highly informative sources, but some guidance 
may be helpful. 

One useful route for many statisticians will be to turn to instances where 
the techniques has been applied. A list of references to recent applications can 
be found in either Tukey 1959a (pp. 408-411) or Tukey 1959b (pp. 327-330). 
These lists unfortunately omitted the 1957 Symposium at the Royal Statistical 
Society on the Analysis of Geophysical Time Series [Craddock 1957, Charnock 
1957, Rushton and Neumann 1957, and discussion], where further references 
to geophysical applications can be found. 

Expositions from one point of view or another have been attempted by Press 
and Tukey 1956, and Tukey 1959b. There is no substitute for reading Chapman 
and Bartels 1940, or one of Bartels’s other expositions of similar techniques, 
e.g. Bartels 1935. 

An account from the point of view of the user has been attempted by Black- 
man and Tukey [1959], who give a fair diversity of references. 

The more abstract background may be sought in Grenander and Rosenblatt 
1957, and in recent papers in Series B of the Journal of Royal Statistical Society. 

No expository account of the analysis of cross-spectra seems so far to exist. 
The only substantial reference continues to be the thesis of Goodman [1957], 
copies of which I understand can now be obtained from: Office of Scientific 
and Engineering Relations (Reprints), Space Technology Laboratories, Inc. 
P.O. Box 95001, Los Angeles 45, Calif. 
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Let us begin by considering a zero mean Gaussian random function X(é). 
The probability structure of such a random function is specified as follows. 
For each n and every choice of times t; < t, < t; < +--+ < t, the values of 
the function X(t,), X(t), --- , X(t,) are distributed with an n-variate zero 
mean Gaussian distribution. The multivariate Gaussian distribution is permitted 


X(t) 


Figure 1—Sketch of a Gaussian Random Function. 


to be singular, i.e., the random values X(t,), X(t), --- , X(¢,) may satisfy a 
deterministic linear equation 


> a;X(t;) = 0. (1) 


j=1 


As is well known the zero mean multivariate Gaussian distribution is deter- 
mined by the covariance matrix of the random variables. Thus, the distri- 
bution of X(t,), X(t.), --+ , X(t,) is determined by the covariances 


EX(t;) X(t) — Rei; ? tk), J; k= 1, 2, i (2) 


In Eq. (2) E denotes the expectation operator. Let us now consider a zero 
mean stationary Gaussian random function X(t). Stationarity requires that the 
probability structure of the random function be invariant under time trans- 
lation, i.e. for each n and each r+ and every choice of times tf, < t, < --- < 
t, X(t, + 7), X(t. + 7) +++ , X(t, + 7) has the same distribution as X(é,), 
X(t), +--+ , X(t,). Thus, for each ¢ and r 


EX(#)X(¢ + 7) = R(7). (3) 


The function R(r) is called the autocovariance function of the stationary random 
function X(t) and one notes that it depends solely on r. The autocovariance 
function R(r) determines the probability structure of the zero mean stationary 
Gaussian random function X(t). 
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Ficure 2——-Spectrum of Random Function X(t) of Equation (4). 


An example of a zero mean stationary Gaussian random function is now 
given. Consider the discrete frequencies 0 < w, < w, < +++ < w) < +++ < wy 
and the random function X (#) given by the finite discrete Fourier representation 


N 
X(t) = = (a; cos wt + b; sin w,;t), (4) 
i=1 
where the Fourier coefficients a; , b; , j = 1, 2, --- , N are independent zero 
mean Gaussian random variables with variances 


Ea; = Eb; =o; , j= 1,2,---,N. (5) 


The variance o; considered as a function of frequency w; is called the spectrum 
of the particular random function X(t) of Eq. (4). A spectrum is displayed 
graphically in Fig. 2. 

One has now to verify that X(t) of Eq. (4) is indeed a zero mean stationary 
Gaussian random function. It is convenient to rewrite Eq. (4) in the complex 
form 


N 


X() = DL’ ee'*" (6) 


7=-N 


where 


and 
“Wi; 


The prime on the summation sign in Eq. (6) indicates that there is no j = 0 
term in the summation. For each n and every choice of times t, < tp < ts < --: < 
ie 


N 
X(t) = 2) @; coswjt; + bj sinwt,), i= 1,2,--+,n (8) 


i=1 


is a linear combination of the independent zero mean Gaussian random vari- 
ables a; , b; ,j = 1,2, --- , N. Thus, the X(¢,), X(t), --- , X(t,) are distributed 
with an n-variate zero mean multivariate Gaussian distribution and X(t) conse- 
quently is a zero mean Gaussian random function. To verify that X(¢) is station- 
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ary one must show that the autocovariance function EX(t + 1)X(é) depends 
solely on r. From Equations (5) and (7) one has by direct computation 


if j#k 
le if j=k 


Ecé, = _ re (9) 
where 
a; 


Thus, from Eq. (6) one has 
EX(t + 7)X(t) = EX(t + )X(d) 


(> ai ; oe 
, tajz(t+r 2 = ATW t@k 

7 k 
E\ c;e )( Cre ) 


i=-N k=-N 


N N 
pm >’ (Ee f,)e'@i' tm tent 


j=-N k=-N 
N 


N 
my hoie'’* = - g; COS w;T, (10) 
7=1 


i=-N 


EX(t+ DX) = R(x) = 2 go; COS w;T. (11) 


Eq. (11) demonstrates that the autocovariance function depends solely on + 
and exhibits the autocovariance function R(r) as the Fourier cosine transform 
of the spectrum. The relation between the autocovariance function R(r) and 
the spectrum is called the Wiener-Khintchine relation. It has now been estab- 
lished that the random function X(t) given by Eq. (4) is a zero mean stationary 
Gaussian random function. 

Consider now a converse question. Suppose it is known that X(t) is a zero 
mean stationary Gaussian random function and that it has a finite discrete 
Fourier representation as given by Eq. (4). Does it then follow that the Fourier 
coefficients a; ,b; ,7 = 1,2, --- , N are independent zero mean Gaussian random 
variables with variances 


Ea; = Eb; = o; , j= 1,2,--- ,N? (12) 


The answer to the question is yes but a proof will not be given. 

Not all zero mean stationary Gaussian random functions have a finite dis- 
crete Fourier representation as in Eq. (4). However, all zero mean stationary 
Gaussian random functions can be obtained as a limit (in a suitable sense) of a 
sequence of random functions having such a finite discrete Fourier representa- 
tion. As a result one has that the general zero mean stationary Gaussian random 
function X(t) has the spectral representation analogous to Eq. (4) 


X() = [ ” [eos wt dU) + sin wt dV(u)) (13) 


where one can heuristically regard the differentials dU(w), dV(w),0 < w < © as 
independent zero mean Gaussian random variables with variances (analogous to 
Eq. (12)) 

E(dU(@))*? = E(dV@)) = dS). (14) 
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Fiaure 3—Spectral Density of Random Function X(t) of Eq. (13). 


The function S(w) is a bounded monotone non-decreasing function called the 
spectral distribution function of the random function X(¢). Analogous to Eq. 
(11) one has the autocovariance function 


R(r) = f cos wr dS(w). (15) 


The case usually considered is the one where dS(w) is infinitesimal and 
dS) = s@) dw. (16) 


The function s(w) is called the spectral density of the random function X(é) and 
is furthermore usually assumed to be continuous. 

A spectral density is displayed graphically in Fig. 3. 
When Eq. (16) holds Eq. (15) becomes 


R(r) = r: COS wT8(w) dw, (17) 


and the differentials dU(w), dV(w), 0 < w < © can be regarded as independent 
infinitesimal zero mean Gaussian random variables with variances 


E(dU(@))? = EV())’ = sw) dw. (18) 


The spectral representation exhibits the random function X(é) as a random 
linear combination of the sinusoids cos wt, sin wt,0 < w < ©. 

It is not absolutely essential to insist on the existence of a spectral density 
s(w). For example, the X(t) of Eq. (4) has 


dU(w;) =4;, dV(w;) = b; ’ j= 1, eee ,N, 


dU) = 0, dV@w@) =0 for w ¥a,, g=1,2,---N, (19) 
so that 
E(dU(@,))? = E(dV(@,))? = o; , j=1,2,--+,N. 


In general, one may have the dS(w) finite for some w and infinitesimal for other 
w. The case where a spectral density s(w) exists, even though it is not the most 
general case mathematically, is sufficiently general for practical purposes. One 
may interpret practical purposes as meaning that only finite lengths of record 
are observed or considered. For example, if the spectral density s(w) is nearly 
the Dirac delta function at frequencies w, < w, < -++ < wy and zero at other 
frequencies, one has a case where the spectral density s(w) approximates the 
“line spectrum” of Fig. 2. On the other hand, for practical purposes the random 
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function of Eq. (4) is sufficiently general if one permits N to be arbitrarily large. 

Spectral analysis of time series or the theory of spectral estimation is concerned 
with the problem of statistically estimating the spectral density s(w) from a 
finite realization (sample) or finite realizations of X(¢). It is more accurate to 
say that spectral estimation is concerned with attempts to statistically esti- 
mate the spectral density s(w), since the spectral density itself can not be un- 
biasedly estimated from finite realizations. 

Up to now only zero mean stationary Gaussian random functions X(t) have 
been discussed. Spectral analysis as defined above is, however, appropriate for 
a larger class of random functions—the random functions that are weakly 
stationary. A zero mean weakly stationary random function X(é) has only its 
second moments invariant under time translation, i.e. for each n and each + 
and every choice of times t, < t, < +--+» <t,X(4i +17), X(4+17),°°-, X(t + 7) 
has the same covariance matrix as X(t,), X(t), --- , X(¢,). It is clear that a 
zero mean weakly stationary Gaussian random function is stationary since the 
zero mean mulitvariate Gaussian distribution function is determined by its 
covariance matrix. A zero mean weakly stationary random function X(¢) has a 
spectral representation 


X(t) = [ ” [eos of dU(s) + sin wt dV] (20) 


where in contrast to Eq. (13) the differentials dU(w), dV(w), 0 < w < © in 
Eq. (20) are uncorrelated (not in general independent) zero mean random vari- 
ables with variances given by Eq. (14). When Eq. (16) holds the weakly station- 
ary random function X(t) has a spectral density s(w). The Gaussian assumption 
is needed in order to deal with the sampling variability of estimators. In these 
comments attention is restricted (for definiteness) to the Gaussian case. 

In discussing the problem of statistically estimating the spectral density 
8(w) from a finite realization of X(t), it is illuminating to also consider the analo- 
gous problem of statistically estimating the spectrum of the random function 
X(t) of Eq. (4). Since X(t) of Eq. (4) has a finite discrete Fourier representation 
the problem of statistically estimating the spectrum is one of estimating variance 
components. Suppose that a finite realization of X(t) of Eq. (4) is observed, 
say X(t), —7’ < t < T is observed. One wishes to estimate the spectrum, i.e. 
estimate the variances of , j = 1, 2, --- , N of Eq. (5). From Eq. (4) it is seen 
that X(#) for each ¢ is a linear combination of the Fourier coefficients a; , b; , 


j = 1, 2, --- , N. Only 2N Fourier coefficients appear in Eq. (4). If 2N times 
t,,@ = 1,2, --- , 2N are chosen appropriately the 2N linear equations express- 
ing the X(t;), 7 = 1, 2, --- , 2N in terms of the Fourier coefficients a; , b; , 
j) = 1, 2, --- , N are linearly independent. Thus, the Fourier coefficients 
a;,b;,j7 = 1,2, --- , N can be expressed as linear combinations of the X(t;), 
t= 1,2, --- ,2N. From a sufficiently long finite realization X(t), -T < t << T 
one can thus determine the sample Fourier coefficients a; , b; ,j = 1,2, --- , N. 
As an estimate of the spectrum o7 , j = 1, 2, --- , N one naturally takes 


Since the Fourier coefficients a; , b; , j = 1, 2, --- , N are independent zero 
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mean Gaussian random variables with variances 07 ,j = 1, 2, --- , N respec- 
tively, one has that 26?/o? , 7 = 1, 2, --- , N are independent x’ random vari- 
ables each with two degrees of freedom. One should notice that even with an 
infinite realization X(t), — < t< © one can do no more than determine the 
sample Fourier coefficients a; , b; ,j = 1,2, --- , N. Thus, even with an infinite 
realization X(t) — © <t< o one has 264/o% distributed x; . Now consider a 
finite realization —7' < t < T of X(t) of Eq. (13). It is assumed that a spectral 
density s(w) exists, i.e. it is assumed that Eq. (16) holds. Consider now 


au) = [ 5 K,.(0| XO+XC9 dt, 


AV(w;) = [ K,.(0| SO =X) dt. 


It is intended that K..,(t), K..,(#) be chosen so that AU(w;), AV(w;) approxi- 
mate dU(w;), dV(w;) respectively. From Eq. (13) 


X() + X(—f _ 
2 : 


X(t) — X(-t) _ ff”. 
ee oe [ sin wt dV). 
Thus, from Equations (23) and (22) 


AU) = [ j Ku (0 [ "ee ave) dt 


(22) 


r cos wt dU), 
. (23) 


« - 
n : ([ K.w,(b) cos wt dt) dU) 
0 0 
us [ F,., (ws) dU() 
0 
where 
i 
Fuh [ K..,(t) cos wt dt. 
Similarly, 


aV(e) = [ Faai() dV) 
where 


P.ifa) = [ ” Kua (8) sin ob db. (27) 


The functions F..,,(w) and F,,,(w) are called filters. Notice that if the filters 
F... ;(w), F.0;(w) were Dirac delta functions centered at w = w; , one would have 
from Equations (24) and (25) AU(w;) = dU(w;), AV(w;) = dV(w;). The AU(w;), 
AV(w;) would then be the direct analogs of the sample Fourier coefficients 
a; , b; previously discussed. Furthermore, one would have that AU(w;), AV (w;) 
are independent zero mean Gaussian random variables with a common variance 
8(w;)dw. As an estimate of s(w;)dw one would naturally take 


§(;) dw = 3[(AU@,))’ + (AV@,))’), (28) 
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Fw iw) 


Figure 4—Sketch of Attainable Filter F, ;(w). 


and one would have 28(w;)/s(w;) a x; random variable. Furthermore, for two 
distinct frequencies w; , w;, one would have §(w;) and §(w;-) independent. Let 
us now be more realistic, and consider the filters F.., ,(w), F,.,(w) approximating 
delta functions centered at w = w; that are attainable with a finite 7. In the 
frequency range w > 0 it is possible to have F.,,,(w) very nearly equal F,, ,(w) 
and attain a filter F.,,(w) as sketched in Fig. 4. The smallest attainable band- 
width B is of the order of 4x/T. 

The filter sketched in Fig. 4 approximates a triangular band pass filter. It is 
possible to also approximate a rectangular band pass filter and other band pass 
filters. One always has the smallest attainable bandwidth B of the order 41/T. 
If (1) the filters F..,,(w), F..,(w) were exactly equal to F.,,(w), (2) the filter 
F.,,(w) vanished outside the band w; — 3B < w < w; + 3B and (8) the spectral 
density s(w) were constant over the band w; — 4B < w < w; + 4B, then it 
would follow from Equations (24) and (26) that AU(w;), AV(w;) are independent 
zero mean Gaussian random variables with a common variance C'ps(w) where 
the constant C, is given by 


wj+4B 
C, = | F®, () deo. (29) 
wj-4B 
Thus, it is natural to take 


iw) = xg- (AU) + (AV (30) 


as an estimator for s(w;) and one has 28(w;)/s(w;) a x; random variable. Can 
one obtain estimators of either oj or analogously s(w;) that have more degrees 
of freedom? In general one can not. However, if it is known that some of the 
o; are equal, say 


2 2 2 2 
Oj-m = °° * = Oj-) = Oj = Ojnr = °°? (31) 


then it is natural to take as an estimator for oj 


Ye 1 42 
oni = 2m + 1 2z Cite (32) 


Clearly, 2(2m + 1)6%,;/o% is then x3.m41) distributed. 
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Analogously, if it is known that the spectral density s(w) is constant in the 
frequency range w; — (m + 3)B < w < w; + (m + 4)B then (assuming con- 
secutive w; separated by a frequency interval B) it is natural to take 


§,,(@;) = om +1 * ee = $(;4,) (33) 


as an estimator for s(w;) and 2(2m + 1)8,,(w;)/s(w;) is then x3,2m+1) distributed. 
Let us now again be realistic. Attainable filters F..,,(w), F,.;(w) are not exactly 
equal and do not exactly vanish outside a bandwidth B. Furthermore, the 
spectral density s(w) is seldom if ever exactly constant in any frequency range. 
Thus 2(2m + 1)8,,(w;)/8(w;) iS x3:em+1) distributed to the degree that the as- 
sumptions stated above prevail. On taking expectations in Eq. (32) one has 


Ee = 5 1 r i Le Oise (34) 

so that even if Eq. (31) does not prevail one can interpret ¢%,; as an estimator 
for the variance average that appears on the right hand side of Eq. (34). One can 
similarly interpret §,,(w;) as an estimator of an analogous spectral average. 
Furthermore, ¢%,, (divided by an appropriate constant) is to a good approxi- 
mation still a x* random variable but with fewer degrees of freedom than 
2(2m + 1). An analogous statement holds for §,,(w;). 

The previous discussion presented one view of statistical analysis of station- 
ary time series. Attention was centered on estimating a spectrum or spectral 
density. Why not an autocovariance function? Why is the frequency domain 
preferred to the time domain? The reason is largely one of statistical simplicity. 
For w # w’, dU(w), dV(w) and dU(w’), dV(w’) are independent or uncorrelated 
whereas for ¢ + t’, X(t) and X(t’) are in general dependent or correlated. In the 
frequency domain one has independent or uncorrelated random variables whereas 
in the time domain one has dependent or correlated random variables. 

These comments conclude with a remark on ergodicity. One may interpret 
an ergodic Gaussian random function as being one which allows its statistical 
structure to be exactly determined from a single infinite realization. From 
earlier remarks the random function X(t) given by the finite discrete Fourier 
representation in Eq. (4) is thus not ergodic. The random function of Eq. (4) 
is sufficiently general for practical spectral analysis if one permits N to be arbi- 
trarily large and only finite lengths of record are observed or considered. 
Ergodicity is unimportant for practical spectral analysis of Time Series. 
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Voi. 3, No. 2 TECHNOMETRICS 


Comments on the Discussions of Messrs. 


Tukey and Goodman 


G. M. JENKINS: 


We are all grateful to Professor Tukey and Dr. Goodman for their comments 
and constructive criticism. Since Dr. Goodman has not chosen to discuss any 
of the issues raised in the two papers, I shall confine my reply to Professor 
Tukey’s lengthy contribution which contains a wealth of knowledge and a 
depth of understanding of the subject. He has based much of his discussion on 
the analogy between spectral analysis and the components of variance aspect 
of the analysis of variance. There is clearly some justification for making this 
analogy but I would have thought that the differences are sufficiently great to 
make it unwise to pursue it too far. In general, the variances which appear in 
the expected values of the mean squares in a components of variance analysis 
will correspond to populations which we will have deliberately sampled and 
for the estimation of which we will have designed a specific experimental ar- 
rangement. When it comes to spectrum analysis, we have in theory an infinite 
number of components of variance from a sample, no matter how small. In 
general we have no a priori preference for particular components of variance 
but we are interested in as accurate a picture of the spectral curve as is possible. 
In other situations, we may not want all this detail; for example in designing 
an experiment to estimate the slope of a response surface when the errors are 
autocorrelated, we will only be interested in those frequencies for which the 
spectral density is smallest; in many other physical applications it is sufficient 
to give the average spectral density over fairly wide bands, presenting finally 
a picture looking exactly like a histogram. In all these situations, the choice 
of components of variance is much more arbitrary than in the classical com- 
ponents of variance situation. However, spectral analysis is usually a part of 
a components of variance analysis in the usual sense. This is due to the fact 
that one is not usually interested in single spectra; in many cases, spectrum 
analysis will form part of a much larger experimental programme. We might 
for example, be investigating the effect of atmospheric turbulence on aircraft 
structures and an experiment may have been planned using different pilots 
and several aircraft flown at each of several heights on a number of different 
days. Instead of one response in each cell of the experiment, we now have a 
time-series which we may replace by its spectrum. The variation in the esti- 
mated spectra may then be compared from one set of experimental conditions 
to the other and we might isolate components of variance for pilots, aircraft 
and days. There is a much greater degree of flexibility here than in the usual 
set-up since components of variance may be isolated for a number of frequency 
bands. 

Professor Tukey has also raised the difference in thinking involved in re- 
garding the data as being composed of a mixture of sine waves (harmonic analy- 
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sis) and random function of time (spectrum analysis). The distinction is an 
important one and whilst most of the discussion has been focused on the latter 
approach in these papers, it should not be forgotten that there are many genuine 
applications of the former when the physical nature of the fundamental fre- 
quency is well established, as for example with the period of rotation of the 
earth and the moon in geophysical and astronomical phenomena. It is custom- 
ary in many such applications to use what is known as a harmonic dial on which 
the amplitude and phase of the harmonic component corresponding to a par- 
ticular frequency is recorded as a two-dimensional vector so that its variation 
over time and space can be followed in a simple pictorial manner. 

In connection with the ‘largest versus the rest’ tests of harmonic analysis, 
Professor Tukey seems to imply that this line of development came to an end 
with Fisher’s publication of the g-distribution in 1927. However, there has been 
considerable criticism of this approach by a number of people since that time, 
notably in H. O. Hartley’s paper mentioned in the references at the end of my 
paper. Recently this test has been modified by P. Whittle and M. B. Priestley 
so as to take account of the fact that the error component or y(t),sndom in Pro- 
fessor Tukey’s notation may have a nonuniform spectrum. Quite apart from 
whether y(t),andom has a uniform spectrum or not, there is a fundamental diffi- 
culty here which is a consequence of the logical inconsistencies which result by 
taking tests of significance too seriously and is common to such problems as 
the analysis of 2” tables, polynomial regression and harmonic analysis. Fisher’s 
“r’th largest harmonic versus the rest” test is similar in principle to taking the 
largest effects in a 2” experiment and comparing them with a pooled residual. 
This suggests that the highly satisfactory solution to the 2" problem given by 
Cuthbert Daniel and Allan Birnbaum in a recent issue of this journal could 
easily be modified so as to deal with classical harmonic analysis. In fact, all we 
have to do is to rank a set of x3 variables as opposed to x7 variables in the 2" 
situation. 

I think that I agree with Professor Tukey's philosophy as far as stationarity 
is concerned and his related remarks about the desirability in certain situations 
of producing spectra although these may be obtained from records which ‘look’ 
as if they were non-stationary. There is no serious drawback to this approach 
as an exploratory tool but I would have thought that it was worth while paying 
some attention to the way in which this averaging was done. Thus, with the 
turbulence investigation mentioned above, it would be unwise to produce 
spectra from records which are too long since it is known that for most turbu- 
lence phenomena, whereas the spectral shape remains constant, the power at 
a particular frequency may vary considerably over time and space. It would 
be reasonable to assume that the total variance over short stretches varied 
itself as a stationary stochastic process so that the whole series could be re- 
garded as being stationary ia this sense. It would nevertheless be much more 
logical to produce spectra for shorter lengths and then average the different 
spectra thereby obtained, producing finally a mean spectrum and the standard 
deviation of the power or logarithm of power at each frequency. 

Professor Tukey has drawn attention to the one point on which we seem to 
disagree mostly, namely on the choice of the bandwidth of the spectral window. 
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We are agreed that there is no hard and fast rule and even if there were one, 
it would almost certainly be wrong. My criticisms of the fixed bandwidth 
method stem from a genuine inability to apply this approach in practice. Since 
we both agree that several choices of bandwidth may be necessary before one is 
satisfied and further, that different bandwidths may be appropriate to different 
parts of the frequency range, I do not think that these differences are very 
great. My suggestion of plotting the autocorrelation function for lags up to say 
30% of the total number of observations seems to have two advantages. 

(a) The autocorrelation function may behave very well and give us valuable 
information about the initial choice of bandwidth or truncation point. 

(b) Since the bulk of the computation in spectral analysis arises in the calcu- 
lation of the autocorrelations, these may be evaluated once and for all. Since 
the transition from autocorrelations to spectra is a trivial one computationally, 
it is perfectly easy in a very short space of time to run off several spectra for 
the same data corresponding to different choices of bandwidth. Professor Tukey, 
on the other hand, advocates going straight from data to spectra and suggests 
that this process should be repeated each time a change of bandwidth is made. 
This strikes me as being a very expensive exercise computationally if several 
choices of bandwidth are made eventually. Since Professor Tukey has laid so 
much emphasis on the cost of computing in his remarks, I find these comments 
rather puzzling. On the other hand he seems to be concerned that Professor 
Parzen’s spectral window involves the calculation of a few more autocorrela- 
tions to achieve a comparable bandwidth to the ‘hanning’ window but is qui‘ 2 
prepared to duplicate the calculation of the autocorrelations. 

Professor Tukey is correct when he says that there is no advantage mathemati- 
cally in calculating spectra at points other than at w; = 2j/m(j = 1, 2, --- m) 
where m is the number of autocorrelations, since the errors are only those of 
interpolation. However, if a peak or trough were to occur at a point exactly 
half-way between these frequencies, then the spectral estimates at these two 
frequencies could under or overstimate this amplitude by a factor of +’/4 which 
is approximately 2.5. Most computer programmes for spectral analysis in 
Britain give complete freedom of choice to the user as to the spacing of the 
spectral estimates. A reasonable choice in my opinion would be to calculate 
spectra at a spacing corresponding to a quarter of the distance between effec- 
tively independent estimates as given in Table 2 of my paper. The extra com- 
puting time is negligible so that there is nothing to be lost by adopting this 
approach. 

I do not think that it is of any great importance to argue the rights or wrongs 
of dividing by n or n — s for the autocovariance of lag. s. As Professor Tukey 
has indicated, this choice is certainly not of any great import as far as spec- 
trum analysis is concerned since this weight factor will be swamped by the 
further weighting of the autocorrelations before taking the Fourier transform. 
From the point of view of estimating the autocorrelations, the choice of divisor 
is a little more important especially if 1/4n < s < 1/2n. I fail to see the signifi- 
cance of Professor Tukey’s remarks that if one is interested in a single auto- 
covariance, one may prefer the unbiassed estimate but that this would not be 
the case if one were interested in several autocovariances. I would have thought 
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that an unbiassed estimate was an anachronism nowadays and the notion of 
the degrees of freedom a historical accident. If one regards estimation as a 
problem of getting as close to the target as possible, then mean square error is 
the most obvious and simplest loss function to use—in the case of the variance 
this leads to the use of (n + 1) and not (n — 1) as divisor, as Professor G. A. 
Barnard has drawn to my attention. This is not going to be very important 
arithmetically for the variance if n is large but the use of the divisor n — s 
instead of n could be a possible explanation for the wild behaviour of the auto- 
correlations for large lag. 

I wonder if Professor Tukey has not overemphasized the asymptotic nature 
of the theoretical results for spectrum analysis. I am quite content to regard 
a situation where there are 1,000 terms and an autocorrelation function which 
damps out reasonably after about 50 lags as being equivalent to at least 20 
independent observations and hence good enough for the asymptotic theory to 
apply. Rather than be concerned with a small sample normal theory, a more 
fruitful line of approach might be to look for spectral estimates which are robust 
with respect to some of the assumptions that we have been discussing. 

I look forward to seeing the development of the two new ideas which Professor 
Tukey has outlined, viz. complex demodulation and the use of pairs of spectral 
windows with all minor lobes positive and all minor lobes negative. 

It is regrettable that owing to the verbosity of all participants in this sym- 
posium it has not been possible to include some actual examples. It is to be 
hoped that this situation will be rectified in the near future and that more case 
studies will be published in this and other journals. 


EMANUEL PARZEN 


I desire to express my gratitude for the very stimulating discussion contributed 
by Prof. Tukey, Dr. Goodman, and Dr. Jenkins. A large number of comments 
and suggestions have been made that I am sure will be really helpful in clari- 
fying both the fundamentals and the practice of spectral analysis of time series 
and will act as an impetus to further research. I believe the aims of this sym- 
posium would be best served if in my discussion I did not discuss further the 
fundamental points which have been raised but rather attempt merely to point 
out the issues which need further discussion by considering the practice of time 
series analysis. 

It seems to me that many scientists are eager to do field work (that is, collect 
data) but reluctant to do home work (that is, analyze data). The only solution 
_ to this problem is to encourage the routine use of large computers for statistical 

data reduction by providing, by means of readily available programs, quick 
and easy access to the computer. By statistical data reduction is meant the 
computation from the data of statistical data characteristics [such as means, 
variances, covariances, correlations, spectra, frequency distributions, and per- 
centiles] which may be used in performing statistical inference on the data. One 
hopes that subjecting data on a routine basis to statistical data reduction will 
provide new information, and new insight, concerning the mechanisms under- 
lying the observed data. 
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We must recognize, however, that, in the words of Professor Tukey in section 
VII of his discussion, “‘ Most data analysis is going to be done by the unsophisti- 
cated. As statisticians we have a responsibility to package as many techniques 
as possible for safe and effective use by those who will analyze, data, and who 
will not understand why the choices in the package were made wisely or 
unwisely.”’ 

In preparing a packaged routine for the spectral analysis of an observed time 
series on . must first decide on the computing formulas that will be used. I think 
this question to be sufficiently important, and perhaps controversial, that we 
ought to have another symposium as soon as possible to thrash out the matter. 
Let me attempt to state the questions that I think the spectral analyst has to 
answer. 

Suppose that one has available a record of finite length of a time series 
{X(t), t = 1, --- , n}. Several issues immediately arise: (i) how should the 
record be screened and edited with the aim of detecting and patching erroneous 
observations; (ii) how should the record be filtered with the aim of avoiding 
aliasing errors and eliminating low frequency components which can be identified 
with trend; (iii) should the record be prewhitened (and if so, how)? 

The observed time series may be assumed to possess a spectral density function 
f(w). As an estimate of f(w) suppose that one considers the function f*(w) 
defined by 


f*(w) = 5 R*(0) > © since ~ \R*@) 


v=1 


In order for this formula to be well defined, one must define the quantities 
R*(v), M, and h(u) which appear in it. The following issues immediately arise: 

(i) How to compute the quantities R*(v) representing estimates of the co- 
variance function of the time series. In particular one has the question of whether 
to divide the sum of lagged products by n or n — v. I don’t believe this question 
can be dismissed by saying that the choice is immaterial, since it only affects 
the shape of the window. It seems to me that if one divides by n — v then the 
spectrum one is estimating is not that corresponding to the true covariance 
function R(v) but rather the result of convoluting the Fourier transform of 
R(v) with the Fourier transform of the function of v equal to (n/n — v) for 
v = 0, +1, --- , +M and 0 otherwise. 

(ii) How to choose the number M of lag values at which R*(v) is to be computed 
(one may want to use several values of M). 

(iii) What lag window h(u) should one employ. There seem to be two main 
competitors (given by equations (5.4) and (5.10) in my paper). In particular, 
attention should be jpaid to the question of whether one desires the estimate 
f*(w) to be non-negatiive. The lag window (5.10) yields non-negative estimates 
while (5.4) does not. 

(iv) At what values of w in the interval 0 < w < 7m should one compute the 
estimate f*(w)? Even|though the function f*(w) is determined by its values at 
the frequencies w; = |rj/M, for j = 1, 2, --- , M, does it suffice to plot f*(w) 
only at these frequenties in order to fairly represent it as a function defined at 
all frequencies w? 
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Prof. Tukey, who has pioneered the practice of statistical spectral analysis, 
could suggest even more issues that need to be understood by the user of spectral 
analysis. Indeed, the writings of Tukey provide us with very illuminating and 
basic insights into all the issues which have been raised in this symposium. 

It seems to me that if statistical spectral analysis is to be performed routinely, 
then routine answers have to be made available to the foregoing questions. The 
literature of spectral analysis makes clear the difficulties involved in providing 
such routine answers. Nevertheless future research on statistical spectral analysis 
would do well to concentrate on finding them. 
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Spectral Analysis Combining a Bartlett Window 
Withjan Associated Inner Window* 


Tuomas H. WoNNACOTT 


Princeton University 


Usual methods jof spectrum estimation allow very strong frequencies to affect 
the spectrum estimates at distant frequencies, because the weighting function (spec- 
trum window) cannot be made identically zero over the rejected range of frequencies. 
To give bounds for} this error, a pair of windows, one with positive side lobes (Bart- 
lett), the other witl| negative side lobes (modified Bartlett), are used simultaneously. 
Thus we demonstralte the error reduction achieved by detrending and prewhitening 
an example econontic time series. The details of the spectrum windows, and a dis- 
cussion of confiden¢e limits for the spectrum, are given in the appendix. A program 
for the IBM 650 is available. 





I. GENERAL 


It has been abundantly emphasized in this issue of Technometrics that one 
cannot estimate the|spectrum at a single isolated frequency. Rather, one must 
be satisfied with estimation of a suitably averaged power, averaged over fre- 
quency according to| a weighting function or kernel which we shall call a spec- 
trum window. It wuld be desirable for this window to average over a band 
of frequencies. 

Yet, because no yhractical spectrum window can be made exactly zero out- 
side a chosen tand,|even an attempt to directly estimate the average power 
in a band will fail [|, see “Bracketing Undesired Effects’’]{. The side lobes of 
the window (the pafts outside the desired band) will permit frequencies out- 
side the band to affect the average value of the spectrum estimate; there will 
be window leakage. (ne way to allow for this window leakage is to use a pair 
of spectrum windows which bound an ideal (no side lobes) band-pass window ; 
that is, roughly speaking, to use both an outer window with positive side lobes, 
and an inner window with negative side lobes, as in Fig. 1. It is necessary that 
the pair of windows|bound the ideal window within the considered frequency 
band (main lobe) as| well as outside it. Then, leaving aside questions of vari- 
ability temporarily, the average value of the outer and inner powers provide 
upper and lower bolinds for the ideal averaged power corresponding to the 
ideal band-pass widow The more leakage, the farther apart are these two 
bounds. Thus not only is the ideal power bracketed on the average, but the 
seriousness of possible leakage is indicated as well. 


* Prepared in connettion with research at Princeton University under Contract DA 
36-034-ORD-2297 sponsdred by the Office of Ordnance Research. Reproduction in whole or 
part permitted for purposes of the U. S. Government. 

+ Numbers in brackets refer to bibliography at end of paper. 
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OUTER WinDOwW 
INNER WINDOW 
IDEAL BAND PASS WINDOW 


FREQUENCY BAND 


FREQUENCY 


Figure 1—Schematic diagram of outer, inner, and ideal band pass spectrum windows, centered 
at zero frequency. 


If, when variability is considered, confidence limits for the corresponding 
average value can be based on the upper and lower estimates, respectively, it 
is possible to give upper and lower confidence limits for the spectrum by com- 
bining an upper confidence limit for the average value of the upper estimate 
with a lower confidence limit for the average value of the lower estimate. The 
resulting (say «%) confidence interval, called a “‘confidence touch estimator’ 
by Tukey [2], covers, in a certain sense’, the spectrum with probability o%, 
no matter how oddly the spectrum behaves inside or outside the frequency 
band, so long as it is continuous. 

One possible pair of spectrum windows is shown in Fig. 2. The general features 
are demonstrated in this specific example, which corresponds to a maximum 
number of lags m = 20, and a center frequency f, = 0. The outer window 
B*(f, f:) is the Bartlett window. The inner window B,(f, f,) was obtained by an 
empirical modification of the Bartlett window. 

To use these windows on the spectrum of a given time series, the modifica- 
tions are done in the lag (time) domain before cosine transformation. Using 
the notation of [3], the mean lagged products C, (corresponding to lag T = rAr, 
where r = —m, —m + 1, --- m while T = —mAr, --- mAr = T,,) are multi- 
plied by the outer (Bartlett) lag window b*(r/m), and by the inner (modified 
Bartlett) lag window b,(7/m), and Fourier transforms are computed separately 
for the two windows. These lag windows are given by 


b*(u) = 1—|ul| for |u| <1 
=0 otherwise. (1) 
b,(u) = b*(u){1.011[1 — .6(| u| — .5) + 5(|u| — .5)* — 3(Ju| — .5)}}. (2) 


1In the spectrum-frequency plane, form the rectangle whose four sides are given by the 
upper confidence limit, the lower confidence limit, and the two end frequencies of the fre- 
quency band. Then with probability a%, this rectangle contains at least one point of the 
graph of the spectrum. 
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The form of the aaa window B*(f, f,) is simplest when f, 
it is given by the cosine transform of the lag window. 


B*(f, 0) = — 


In general B*(f, f;) is 


B*(f, 


and is thus never negitive. It has peaks near f = f, and f 
shape differs very little 


tt) and inner (modified Bartlett) spectrum windows, for 20 lags, and 


centered at zero frequency. 


m estimate at, for example, frequency f, (more exactly, 
the spectrum averaged over the frequency band centered 
ained by the cosine transform 


= i> C, (2 ) cos (anf. r). (3) 


bs is given by the integral of the spectrum S(f) weighted 
ctrum window centered at frequency f, , B*(f, f,), ie 


w(*) cos (2nf.7 r) 
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riven by 


fi) = 2B*(f — f, ,0) + 2B*G¢ + f, , 0) (6) 


= —f, , where its 
from that of B*(f,0) near f = 0. 


For the lower spectrum estimate all the remarks analogous to the above hold 
true, except that there is no convenient closed form for the inner spectrum 


window given by 


B,(f, 0) 
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Ficure 3—Outer, inner, and Hamming spectrum estimates of raw European wheat prices. 


II. EXAMPLE 


Sir William Beveridge’s annual European wheat prices series running from 
1500 to 1867 [5], was chosen to provide an example of the application of the 
Bartlett pair of estimates. The actual index numbers, rather than the indexes 
of fluctuation, were used as our raw data, x, . This series contained a large 


trend which was removed by weighted least squares linear regression; the resi- 
dual series we shall refer to as the detrended series y, . Under the (admittedly 
unrealistic) assumption that the original series is a straight line plus independent 
errors of equal variance, the weights used would have provided the residuals 
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Fiaure 4—Bands of 80% confidenceintervals derived from Fig. 3, for spectrum of raw European 
wheat prices. 
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Ficure 5—Bands of 8% confidence intervals for spectrum of detrended European wheat 


prices. 


of constant variance. Ordinary unweighted regression would give comparable 
results in the spectrum analysis. 

Since the spectrum still showed a preponderance of low frequencies, a 
prewhitening transfprmation in the form of a moving linear combination 
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was applied to the detrended series to adjust its spectrum 


(by the factor | 1 — |75e"**’ |”) closer to flatness, or to “whiteness.” The effects 


of window leakage 


are minimized for nearly flat spectra. The spectrum esti- 


mates for z, were finally recolored by multiplication by the compensating factor 
| 1 — .75e""*’ |-®. This procedure of prewhitening-recoloring can be equivalently 
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Figure 6—Bands of 80% confidence intervals for spectrum of prewhitened-recolored European 


wheat prices. 
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thought of as reshaping the various windows, reducing the side lobes where 
the extraneous frequencies are very powerful, at the cost of increasing the side 
lobes where the frequencies are weak. 

Figures 3 and 6 show the spectral analysis of x, , y, , and z, . A total at 60 
lags spaced one year apart, were used in each case (m = 60). For the raw series 
x, , Fig. 3 gives 3 sets of spectrum estimates, corresponding to the outer, inner, 
and Hamming spectrum windows. 

The Hamming window is not an ideal window; its results are included be- 
cause of its extensive use. It has the same area, and almost the same main lobe 
as the outer window. It has an 8% higher peak value, however, which explains 
why the Hamming estimates sometimes exceed the outer Bartlett estimates 
at peak frequencies. Fig. 3 illustrates some very bad window leakage. The 
trend, close to zero frequency, shows up in all the odd outer estimates and all 
the even inner estimates, even driving 18 of the latter estimates negative. This 
analysis indicates that something should be done to reduce the window leakage. 

Fig. 4 gives 2 bands of 80% confidence intervals for the spectrum, one band 
derived from the Hamming estimates, the other from the outer-inner pair of 
spectrum estimates. Each interval is derived as explained in appendix B. Note 
that the latter band is much wider, because it allows for the window leakage 
in the sense of footnote 1. 

Fig. 5 shows how detrending the series improves the spectrum estimation. 
Fig. 6 shows the further improvement that prewhitening-recoloring makes. 
The window leakage has become of minor importance, except at the very weakest 
frequencies. The upper and lower estimates give roughly the same answer as 
The Hamming estimates, with the added assurance that window leakage has 
been fully accounted for. 

The ‘computations were carried out on Princeton University’s IBM 650 com- 
puter at 167 Nassau Street. The well-known Statisan II program [6] was modi- 
fied to use the outer and inner windows (for the spectrum estimates, but not 
for the cross spectrum estimates). The modified program is designated ‘‘Statisan 
2B.”’ Copies may be obtained from the Statistical Techniques Research Group, 
Princeton University, at 167 Nassau Street, Princeton, N. J. 


APPENDIX A. 


DetaliLs oF LOWER (Mopi!Fiep BartTLEeTrT) SPEcTRUM WINDOW 


The lower spectrum window B,(f, f,) depends also upon the maximum number 
of lags m; this dependence will be indicated by extending the notation to 
B,»(f, f:). As explained in equation (6), we can restrict our investigation to the 
cases where f, = 0; this has, in fact, been done in Fig. 2, and we shall continue 
to do so here. 

Now the following 3 questions arise: 

(i) Are we sure that B,,,(f, 0) is negative for all frequencies outside the 
main lobe? 
(ii) How large are the side lobes of B,,.(f, 0), compared to those of the outer 
window B*(f, 0)? 
(iii) How large is the difference B*(f, 0) — B,(f, 0) at various frequencies? 
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Figure 7—Approximate extremes of outer and inner spectrum windows, centered at zero 
frequency. 





To answer question (i) and (ii), we introduce the following notation, illus- 
trated in Fig. 2: at trequencies 7',,f = 1, 2,3, --- j, --- [m/2], (where [ ] indicates 
“the integral part of”) the outer window attains its minimum value, zero, 
while the inner windbw attains a value which is close to the corresponding local 
minimum and will be denoted by B,,,(j). The integer, j, may be interpreted as 
the inner window’s |side lobe number. Similarly, at frequencies T,,f = 1.5, 
2.5, «++ 9 + .5, --4 [m/2 — .5] + .5, both inner and outer windows attain 
values which are clost to local maxima, and which we shall denote by B*(j + .5) 
and Bym(j + .5) respectively. All three functions B*(j + .5), — Byn(j + -5), 
and — B,n(j) decrease more or less proportionally to (j + .5)~* or j~’, as Fig. 
7 shows. 

Figure 7 exhibits |the following side lobe properties of the inner window 
B,»(f, 0): (i) the side lobes are always negative; (ii) in absolute value, they 
are about 35% greater than (interpolated values for) those of the outer window 
B*(f, 0); except for the first. side lobe, which is reduced to about 85% of the 
second. 


As a verification of the above approximations to the extremes of the side 


TABLE 1 


Ratio t local extremes to values at integer or half-integer j. 


lower ulindow upper window 
b= 20 m = 60 m = 20 

1.05 1.06 

.61 .59 : 1.04 

4. 1.11 

: .89 , 1.01 

=: 1.05 
.94 1.00 
1.02 
.96 
1.01 
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lobes of the inner window, Table 1 compares approximate extremes to the 
exact extremes. 

Question (iii) has already been partly answered: The side lobe behaviour 
determines the difference B*(f, 0) — B,,.(f, 0) for frequencies outside the main 
lobe. The integral of the difference, 


+1/2Ar 
[,, BAGO — Beal, OO} af 
—1/2Ar 


is, by Parseval’s theorem, proportional to the difference in the lag windows 
at zero lag, found from equations (1) and (2) to be 


b*(0) — b,(0) = 1.000 — .777 = .223. 


This means, for example, that if the spectrum is white, the estimate correspond- 
ing to the inner window will, on the average, be 77.7% of the estimate corre- 
sponding to the outer window. Nearly all of this difference is caused by the 
side lobes. (To the 22.3% difference, the main lobe contributes 2.3% for m = 10, 
1.7% for m = 20, 1.5% for m = 60). 


APPENDIX B. 


DETAILS OF CONFIDENCE BANDS 


The following confidence intervals, following Blackman and Tukey [3, p. 
106-112], are only approximate and assume that the stochastic process is Gaus- 
sian, with a spectrum linearly varying over the-main lobe of the window. This 
latter assumption very likely does not hold for the zero frequency estimate, 
and the other assumptions are at best dubious. Thus these confidence intervals 
are to be taken only as a loose guide. 

The spectrum estimates are approximately proportional to a x’ variable. 
For the Hamming and outer (Bartlett) estimates, the number of degrees of 
freedom is, roughly 2(368/60 — 3) = 12. The Bartlett estimate has 20% more 
degrees of freedom than the Hamming estimate, but in view of the roughness 
of the approximations, the difference hardly seems worthwhile considering. 
For the inner (modified Bartlett) estimate, the number of degrees of freedom 
is reduced to about 7, for the following reason: the number of degrees of freedom 
is proportional to the squared mean divided by the variance of the estimate, 
that is, proportional to the square of the integral of the window divided by 
the integral of the squared window. This latter integral is about the same for 
the inner window as the outer, while the (net) integral of the inner window is 
reduced, because of the negative side lobes, to about 78% of the integral of the 
outer window. This reduces the number of degrees of freedom by the factor 
.78° = .6, to the value 12 X .6 = 7. 

The 80% confidence limits can be obtained from the upper and lower 10% 
points of the x’ distribution on the appropriate number of degrees of freedom. 
For the Hamming estimates, the lower and upper 80% confidence limits for 
its mean value are 1.557* and .53~' times its realized value. For the inner- 
outer pair of estimates, the upper confidence limit is .53~* times the realized 
value of the outer estimate, and the lower confidence limit is 1.72~* times the 
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realized value of the inner estimate. These various factors (1.727', etc.) are 
easily plotted for all frequencies on a logarithmic scale, by merely translating 
the spectrum estimdtes. 
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Frequency Response From Stationary Noise: 
Two Case Histories 


N. R. Goopman,’ S. Katz, B. H. Kramer® ann M. T. Kuo 
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Two continuous chemical processes, regarded as input-output systems, were 
probed with stationary noise, and their frequency response characteristics estimated 
from spectral analyses of the input and output records. Statistical confidence bands 
for the estimated system gain and phase were computed. The noises were super- 
imposed on steady operating levels of the systems, and the analyses conducted, 
following Goodman [1], on the assumption that the output fluctuation in each case 
was the sum of a linear operation on the input fluctuation and a corrupting noise 
uncorrelated with the input. 

One process was a blending operation realized in bench scale hardware; the other 
was a digital computer simulation of a continuous stirred tank reactor. The blender 
was essentially linear; the reactor, highly nonlinear. For each process, the theoretical 
exact (or linearized) frequency response characteristic was computed beforehand 
from the appropriate differential equations for comparison with the experimental 
results. 

The agreement between the theoretical and the experimental results was reason- 
ably good for the linear process, but noticeably poorer for the nonlinear process, 
and, as might be expected, progressively poorer for the nonlinear process as the 
amplitude of the probing noise was increased. A reason offered for this behavior is 
the difficulty of ensuring, especially for nonlinear processes, that the corrupting 
noises in a system under study are statistically uncorrelated with the system driving 
forces. 


1. INTRODUCTION 


Under the influence of the ideas of servomechanisms theory, modern process 
control technique regards the process as an input-output device [2, 3, 4, 5], 
and begins with an attempt to find a quantitative description of its dynamic 
operating characteristic. In the simplest cases, the input-output system may 
be represented schematically as follows: 


where ¢ is the time, x(t) is the system driving force and y(t) is the system re- 
sponse. In the chemical process applications, the driving forces and the responses 
are typically flows, temperatures, concentrations. This study reports two experi- 
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mental case histories in estimating the dynamic characteristics of simple 
chemical processing units from suitable measurements of the system driving 
forces and responses. 

The aim in each case is to measure the dynamic characteristic of the process 
for small fluctuations about a fixed operating level, with a view, say, to de- 
signing a control system to regulate the process at this given level. There is no 
attempt to derive information about the steady state relationship between 
driving force and response [14], so as to be able to conduct intelligent search 
for better operating levels. 

When the system (1) represents a continuous process, with the driving force 
x and the response y given as small fluctuations around their operating steady 
states, the system may to a certain approximation be regarded as linear (in the 
sense that the response to a sum of elementary disturbances to the driving 
force may be computed as the sum of the corresponding elementary responses) 
and time-invariant (in the sense that its descriptive parameters do not depend 
on the time). Since quite arbitrary functions of time may be represented as 
sums (or integrals) of sinusoids, we know the whole dynamic characteristic of 
a linear system (1) when we know how the system responds to an arbitrary 
sinusoid 


x(t) = asin (2rft + ¢) (2) 


of frequency f, amplitude a and phase ¢. For a time-invariant linear system, 
the response to a sinusoid (2) is another sinusoid of the same frequency f, but 
with amplitude multiplied by a factor G(f) > 0 and with phase shifted by a 
term P(f): 


y(t) = a-G(f) sin (2xft + ¢ + P(f)) (3) 


The gain function G and the phase shift function P of the frequency f thus 
furnish the required dynamic characteristic of a linear time-invariant system. 
They are often for analytical convenience assembled into a single complex- 
valued function of f 


L(f) = Ge”, (4) 


the frequency response function of the system. 

A straightforward method of measuring the frequency response of a system 
(1) is given directly by equations (2), (3). One imposes a sinusoid of known 
frequency f on the system, waits for the response to settle into a steady sinusoid, 
and measures the amplitude ratio and phase difference for the process input 
and output to obtain the values of G and P for that frequency [6, 7]. The whole 
procedure must then be repeated for a number of different frequencies to obtain 
suitably fine-grained plots of G(f) and P(f). The real difficulties in this measure- 
ment procedure arise when the relationship between y and z in the system (1) 
is corrupted by a disturbance n, as shown below: 


n(t) 
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Here the sinusoidal driving force (2) produces, not the pure sinusoidal response 
(3), but rather (3) affected by the disturbance n. The measured response must 
then be subjected to some sort of (more or less elaborate) filtering procedure 
in order to recover estimates of the system gain and phase shift. 

An alternate way of measuring the frequency response of a system (1) or 
(5) is to impose on the system a very complicated driving force x, with com- 
ponents at all (or many) frequencies in the range of interest, measure the corre- 
sponding response y, and, by a suitable Fourier analysis, recover the gain and 
phase functions of (2), (3) at one blow for all the frequencies of interest. An 
attractive feature of this approach is the possibility that disturbances of the 
required character will be found in the normal operation of the process, so that 
no special experiments will have to be designed in order to make the required 
measurements. But even more importantly, because of the complicated nature 
of the driving force x (presumably also of the disturbance n) and hence of the 
response y, one may consider them to be realizations of random noises [8, 9], 
and so use statistical ideas and methods both to assist in filtering out the dis- 
turbance n and to obtain measures of the sampling variability of the estimated 
gain and phase curves. This statistical approach is the one applied in this study. 

Goodman [1] examined the system (5) on the assumption that xz, y and n 
were stationary Gaussian noises, with x and n statistically uncorrelated with 
each other. His analysis showed that from the input spectrum S,(f) and the 
input-output cross-spectrum S,,(f), one could recover the frequency response 
function (4) in the form 


L(f) = S.()/S.) (6) 
as though there were no disturbance n at all. (The spectra S(f) measure, in 
the electrical engineer's language, the power densities, at the frequencies f, 
associated with the noises x, y etc. They represent accordingly the result of a 
Fourier analysis of the noises. S, is a real non-negative function of the frequency 
f, and S,, a complex-valued function.) The effect of the disturbance n appeared 
only in the coherence y*(f) between x and y, a measure of linear dependence 
analogous to the square of a correlation coefficient, defined as 


2 _ a : 
1D = SHS, 7) 


where S,(f) is the spectrum of the output noise y. The effect of n was to throw 
the coherence into the form 
2 1 
v (f) a Sh (8) 
| L(f) |? S.(f) 

where S,(f) is the spectrum of n. It may be seen from (8) just how the coherence 
decreases as the size of the disturbance increases. 

It might be remarked parenthetically that the concepts and results noted 
in the preceding paragraph can be readily exhibited in terms of the spectral 
decomposition of the stationary noises x, y, n: 


ui) = [ e'2lh af 
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Ficure 1—Blender gain. 


etc., where 


(z.(z(f’)) = S.(f) of — f’) 
(z.(fey(f’)) = S.ulf) of — 7’) 


etc. (The angle brackets denote statistical averages, the bar, the complex conju- 
gate, and 6 the Dirac delta function). The input-output relationship for (5) may 
then be written in terms of the spectral random variables z as 


2,(f) = Lifle.(f) + a(/) 


with the uncorrelatedness between n and x appearing *s the vanishing of the 
cross-spectrum S,, . The relation (6) then follows at once. The coherence (7) 
appears as essentially a squared correlation coefficient between the spectral 
random variables z, , z, , and a straightforward albegraic manipulation gives 
the relation (8). 


Following Tukey [10, 11], Goodman established methods of estimating the 
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Figure 2—Blender phase shift. 


spectra S, , S, , S,, from finite sampled records of x and y, and hence, by mim- 
icking (6) and (7), of estimating the frequency response function L and the 
coherence y’. Further, he developed statistical confidence statements for the 
estimated frequency response characteristic in terms of the record length,” the 
resolution sought on the frequency axis, and the underlying coherence. With 
some prior knowledge about the system and the scale of the disturbances affect- 
ing it, it thus became possible to plan beforehand the length of run required to 
achieve a given precision in the estimated frequency response, at a set confi- 
dence (probability) level, for a desired spread of points on the frequency axis. 

These methods are those applied in the present study. It might be noted 
that, given the technique of estimating the spectra S, , S., , the estimate of 
the frequency response function L is based squarely on equation (6), which 
in turn depends strongly on the assumption that x and n are uncorrelated. This 
assumption will be called into question later, in the course of a discussion of 
the experimental results. 

The present study applies these methods to two chemical processing situa- 
tions: a blending operation realized in bench scale hardware, and a digital 
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Figure 3—Reactor gain. (Input noise o = .025) 


computer simulation of a stirred tank reactor. The systems were probed in 
each case with suitably designed stationary noises, and long enough runs made 
to obtain (roughly) 10% accuracy on the estimated system gains with 50% 
confidence (probability .5). For each system, the theoretical exact (or linear- 
ized) frequency response characteristic was computed beforehand from the 
appropriate differential equations for comparison with the experimental results. 
Only one experimental run was made on the (essentially linear) blender, but 
several runs were made on the highly nonlinear reactor, to see just how the 
nonlinearities took hold as the amplitude of the input disturbance was increased. 
The comparisons between the theoretical and experimental results are shown 
for all runs in Figures 1-10. Sections 2 and 3 following describe, respectively, 
the blender and the reactor experiments, and Section 4 is a short discussion of 
the results. Section 5 is essentially a computational appendix, summarizing 
the methods of calculation and quoting the appropriate statistical results. 

The emphasis in the present study on spectra and frequency response is of 
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Figure 4—Reactor phase shift. (Input noise ¢ = .025) 


course without prejudice to the work of Reswick [15] and others, who analyze 
the system (5) in the time domain, recovering the impulse response function 
K(r) more or less directly from the integral relation 


y(t) = . K(r)a(t — 7) dr + n(d) 


The frequency response function (4) is just the Fourier transform of the impulse 
response 


L(f) = [ i K(ne2""" dr 


and both methods of analysis are in principle entirely equivalent. The control 
engineer will however commonly prefer to work in the frequency domain, if 
only because of the simpler mathematical form of the input-output relationships. 
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Figure 5—Reactor gain. (Input noise o = .05) 


2. THE BLENDER 


This Section describes the linear blending operation studied: a continuous 
input-output mixer with temperature as tracer. The frequency response func- 
tion measured was between the vessel inlet and outlet temperatures. 

A glass vessel with reasonably good thermal insulation was continuously 
fed with water and allowed to overflow to a sink. The inlet water was fed from 
a constant head tank to assure steady flow. Interposed in the water feed line 
was an electrical immersion heater, whose electrical connections were brought 
out to a variable transformer. The contents of the vessel were continuously 
(and vigorously) stirred. The temperatures of the water entering and leaving 
the vessel were measured by conventional thermocouple arrangements, and 
recorded on a Leeds and Northrup Type G Speedomax Recorder. 

A single experimental run was made, with the noisy input to the system 
applied to the transformer controlling the immersion heater. Before the start 
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Figure 6—Reactor phase shift. (Input noise ¢ = .05) 


of the experimental run proper, the transformer was set to a steady voltage, 
and left there until the outlet temperature reached its effective steady state. 
Once the steady state had been established, a precomputed noise was applied 
manually to the transformer setting. This noise was constructed as a sequence 
of independent Gaussian amplitudes, of constant mean and variance, with 
switching from one to the next at the points of a Poisson process in time. The 
spectral shape characteristic of this noise 


2 


a 
S(f) — 1 -- bf? 
was of course not found in the actually recorded vessel inlet temperature, be- 
cause of the damping action of the heater system. 
The behavior of the blender may be described by the linear differential 





N. R. GOODMAN, S. KATZ, B. H. KRAMER AND M. T. KUO 


6.0 


40 


GAIN , G(f) 











FREQUENCY, f 
(1 Frequ. Unit = 1/1920 CPS) 


Figure 7—Reactor gain. (Input noise ¢ = .1) 


equation 


dU 
a = mc(T — U) -—q 


= time, min. 
vessel inlet temperature, °C 
vessel outlet temperature, °C 
v = vessel volume, 1. 
m = water feed (takeoff) rate, 1./min. 
c = volumetric specific heat of water, cal./1.-°C 
q = rate of heat loss from vessel, cal./min. 


For a steady feed temperature 


T(t) =T 
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the corresponding steady outlet temperature is 
Ui) = P- 
The departures from these steady values 
RE 
y(t) = Ut) —U 
accordingly are connected by the differential equation 


dy _ 
, dt . y 
where 


6 = v/m = nomial residence time of water in vessel, min. 
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Figure 9—Reactor gain. (Input noise ¢ = .2) 


The standard linear blending equation (9) may be considered as defining an 
input-output process (1). Disturbances to this input-output system as in (5) 
may be taken to arise perhaps from fluctuations in the flow rate (m), or in the 
ambient temperature (affecting q). A solution by transforms of the differential 
equations (9) leads to the frequency response function 


1 
"<4 Qnif 0 


which has, following (4), the gain and phase parts 


li et 
CA = Vi + Qrfo? 


P(f) = —arctan (27f6) 


where f is in cycles per minute. 
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Ficure 10—Reactor phase shift. (Input noise ¢ = .2) 


The experimental run aimed at measuring the frequency response character- 
istic (10) was made with 


v= .51 1. 
m = .11 1./min. 
leading to a nomial water residence time in the vessel 
6 = v/m = 4.6 min. 


The run was made at an average vessel inlet temperature (7') of roughly 40°C, 
with the random swings in the heater transformer settings designed to give 
inlet temperature swings of the order of +10°C. The average time interval 
between changes in the transformer settings was taken to be 1 minute, so that 
there were on the average 4 or 5 changes in every nomial residence time. The 
overall length of the run was approximately 5 hours, or about 65 nomial resi- 
dence times. The input-output data (7' and U) that entered the spectral calcula- 
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tion of Section 5 were taken every 2/5 minute (everythird recorded point), 
since this was found to give a sufficiently high Nyquist cutoff frequency. The 
spectra and the frequency response were calculated at frequency intervals of 
1/32 cpm, giving 40 points in the fundamental frequency range 0 — 1.25 cpm. 

Two small side adjustments had to be made to the experimental results over 
and above the main calculation of Section 5. The first estimated and corrected 
for the instrument error in reading the inlet temperature. The second corrected 
for a 4 second time interval arising from the nature of the recording equipment, 
between the readings of the inlet and outlet temperatures. 

In the experimental run, there was apparently some source of disturbance 
to the pure connection between x and y shown in (9), since the spectral calcuia- 
tion showed sample coherences (particularly at low frequencies) as low as .8. 
A suitable average value of the observed coherences was used for computing 
the confidence limits around the estimated gain and phase shift functions. The 
comparison between the theoretical and estimated frequency response character- 
istics is shown in Figures 1-2. This comparison is discussed in Section 4. 


3. THE REACTOR 


This Section describes the nonlinear reactor operation studied: a continuous 
stirred tank reactor supporting an exothermic second order reaction, and cooled 
by passage of water through a jacket. The (linearized) frequency response func- 
tion measured was between the inlet and outlet reagent concentrations. This 
reactor system had been the subject of a combined analytic—numerical study 
of reactor stability and control by Amundson and others [12, 13], and the ex- 


perimental runs reported here were actually digital computer simulations. 
The reaction in question was taken to be of the form 


2M — product + heat 


and the reactor was fed at a constant volumetric rate with a mixture of M and 
inerts. The temperature of the feed was held constant. In order to regulate 
the inherent instability of the reactor at its normal operating point, a control 
action on the coolant flow rate was taken, proportional to the departure of the 
temperature of the process material in the vessel from its normal operating 
value. The inlet temperature of the coolant was held constant. The reaction 
vessel was taken to be perfectly mixed, in the sense that it could support no 
temperature or concentration gradients. 

Four experimental runs were made, the digital computer simulation consist- 
ing of the step-by-step solution of a pair of differential equations describing 
material and energy balances in the reactor. The inlet and outlet reagent con- 
centrations were printed at suitable time intervals. Each run was started from 
the normal operating point of the system, in order to minimize the run time 
necessary to reach a stationary condition. During the course of each run, 4 
noise was applied by the computer to the inlet reagent concentration. This 
noise was constructed by computing on line a sequence of independent Gaussian 
amplitudes of constant mean and variance, and switching from one to the next 
at each time step in the numerical integration procedure. Thus the noise was 
effectively white (flat spectrum) when sampled at any time interval greater 
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than the integration step. The mean of the noisy inlet reagent concentration 
was held constant at its normal operating value for all the runs, but the variance 
was changed from run to run in order to explore the effect of wider excursions 
into the nonlinear. 

The behavior of the reactor studied here may be described by the material 
and energy balances 


v ut = m(Ay — A) — wpA’e*’*” 


wo = me(6 — 6) + QupAre*™* — D 


D = Sh(@ — 0’) = m'c'(0’"" — @) 


time, sec. 
= concentration of reagent in feed, # mols/ft.° 
concentration of reagent in reactor (and outlet), # mols/ft.’ 
process feed temperature = 650°R 
temperature in reactor (and outlet), °R 
inlet coolant temperature = 520°R 
6’’(t) = “average’’ coolant temperature, °R 
= outlet coolant temperature, °R 
= frequency factor in second order rate expression = (2.7) 10” ft®/sec-# 
mol 
activation energy in second order rate expression = 44700 BTU/# mol 
heat of reaction (> 0 if exothermic) = 20000 BTU/# mol 
= gas constant = 1.987 BTU/# mol-°R 
= reactor volume = 100 ft° 
feed (takeoff) rate of process material = .3 ft*/sec. 
= feed (takeoff) rate of coolant, ft*/sec. 
volumetric specific heat of process material = 60 BTU/ft’-°R 
= volumetric specific heat of coolant = 62.5 BTU/ft’-°R 
S = heat transfer surface between coolant and reactor = 500 ft’ 
h = overall heat transfer coefficient between coolant and reactor = .02778 
BTU/sec-ft’-°R 
D(t) = heat duty of coolant, BTU/sec. 


With the assumption that the “average” coolant temperature is actually the 
arithmetic average between the inlet and outlet coolant temperatures 
m= 19 + 6") 


the average and exit coolant temperatures 6” and 6’” can be eliminated from 
(12), leaving 
— 
oo = (13) 
Sh ht Imi? c 
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The quantities describing the steady state operating point of the reactor 
A?) a8 A, 


must satisfy the steady state versions of (11), (13) 
= m(A, — A) — vpd’e"*/* 
0 = mc(% — 5) + QupdA’e*™ — D 
6 — 0’ 
1 1 
Sh t Qm'c’ 


for which a consistent set of values is 


D= 


619 
700. 
.138 
1380. 


As noted earlier, this operating point represents an unstable condition in the 
reactor, and in the simulated experimental runs, this instability was neutralized 
by sensing the process temperature @, and imposing the control action 


m!’ = m’ + a(@ — 8) (15) 
on the coolant flow rate m’, where 
a = proportionality constant in coolant flow rate controller = .01207 
ft®/sec—°R 
With the numerical values cited, equations (11), (13), (15) may be assembled 
to give 


dA = 00 on 00 
lt — .003( A, -_ A) = .00298 A’e gg ipa 
dé 


a." 


003(650 — 6) + .993.A%07#-17700 _ 


348 
1 + .0875(@ — 700) 


and these were actually the equations solved to simulate the experimental 
runs. With 


432 + 


Af= 1 +2(8 
A(é) = 619 + y(2) 


(17) 
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so that x(t) and y(t) represent departures from the operating steady states 
noted in (14), the differential equations (16) define an input-output system (1). 
For small disturbances x, the system may be considered as approximately 
linear, and in the form (5), where the disturbance n are taken as the effect of 
the nonlinearities in (16). In the computer solution of (16), these non-linearities 
are indeed the only effective source of the disturbance. On linearizing the equa- 
tions (16) around the operating point (14), and applying transform methods, 
the (linearized) frequency response function connecting the x and y of (17) 
appears as 

Lif) = — .003(.0042 — 2mif) (18) 

(.00125 + .005917 + 2mif)(.00125 — .005917 + 2zif) 


with the corresponding gain and phase functions of (4). The f in (18) is in cycles 
per second. 

The simulated experimental runs aimed at measuring the frequency response 
characteristic (18) were made by carrying out the solution of (16) step by step 
on a digital computer, with starting values 


A lene = 1 
6 line = 700 


corresponding to the operating point (14). A Runge-Kutta integration procedure 
was used with a time step of 2 seconds. Each run was about ten reactor hours 
(120 nominal residence times) long. The input-output data (A, and A) that 
entered the spectral calculation of Section 5 were taken every 16 seconds, since 
this was found to give a sufficiently high Nyquist cutoff frequency. The spectra 
and the frequency response were calculated at frequency intervals of 1/1920 
cps, giving 60 points in the fundamental frequency interval 0 — 1/32 eps. 

During each run, the noise imposed on the inlet concentration A, was con- 
structed as a sequence of independent Gaussian amplitudes, one for each 2 
second integration step. The Gaussian variates all had the same mean, the A, 
of (14), and, within each run, the same variance o”. The four runs made used 
progressively larger values of the standard deviation o 


o = .025, .05, .1, 2 


in order to see how the nonlinearities in (16) affected the estimates of the fre- 
quency response characteristic (18) as the amplitude of the input disturbance 
was increased. (The amplitudes of A» actually used in the calculation were 
clipped below in order to avoid negative concentrations, and clipped above in 
order to avoid scaling difficulties in the calculation.) Since the input noise was 
heavily aliased at the sampling interval used, the estimates of frequency response 
were made using the known flat spectrum for the input, rather than that com- 
puted by the methods of Section 5. 

As the amplitude of the disturbance was increased, the coherence estimated 
in the course of the spectral calculation fell, because of the increased effect of 
the non-linearities. For each run, a suitable average value of the estimated 
coherence was used in order to compute the confidence limits around the esti- 
mated gain and phase shift functions. The comparisons between the theoretical 
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and estimated frequency response characteristics are shown for all the runs in 
Figures 3-10. These comparisons are discussed in Section 4. 


4. Discussion oF RESULTS 


Figures 1-10 show the comparison between the theoretical and experimental 
frequency response characteristics for the case studies reported here. Figures 
1-2 make the comparison for the single experimental run on the blender, and 
Figures 3-10 for the four simulated runs on the reactor. Figures 3-4 are for the 
reactor run with the weakest input noise, with standard deviation ¢ = .025; 
Figures 5-6 have o = .05; Figures 7-8 have o = .1; and Figures 9-10 are for 
the strongest input noise, « = .2. In all the Figures, the theoretical curves are 
shown as solid lines, and the corresponding experimental curves as heavy broken 
lines. The confidence bands for the experimental curves are shown as light 
broken lines around the experimental curves. The confidence level in every 
case is 50%; that is, in a given experimental run, the probability is .5 that the 
theoretical gain-phase pair for a particular frequency lies within the confidence 
limits around the corresponding discovered experimental values. 

The prime thing to be noticed in these curves is that agreement between 
theory and experiment seems to be substantially better for the essentially 
linear blender (Figures 1-2) than for the highly nonlinear reactor (Figures 
3-10). Further, the agreement between theory and experiment for the (non- 
linear) reactor gets progressively poorer as the amplitude of the probing noise 
is increased; that is, Figures 3-4 show better agreement than Figures 5-6, 
Figures 5-6 better agreement than Figures 7-8, and Figures 7-8 better agrec- 
ment than Figures 9-10. In general, all this is not especially surprising, in view 
of the fact that the very idea of a frequency response function is an essentially 
linear one. The nonlinear system suffers on linearization, and suffers more the 
greater the amplitude of the disturbance whose effect is being linearized. 

The poor agreement exhibited at low frequencies in Figures 1-2 even for the 
(linear) blender may well be due to slow drifts in the system parameters over 
the period of the experimental run: perhaps in the water flow rate, or in the 
ambient temperature (affecting the heat leak from the vessel). Such slow drifts 
could easily have distorted the behavior at low frequencies, since the spectral 
calculations were corrected only for the average noise levels, and not for trends. 
The more complete computing scheme outlined in Section 5 shows how to make 
these corrections for trends as well as means. 

The poor agreement in Figures 3-10 for the (nonlinear) reactor cannot be 
attributed to drifts in the system parameters. The experimental runs were 
computer simulations, and no such drifts were introduced into the calculation. 
Indeed, the disagreement at low frequencies is no worse than anywhere else 
in the frequency range. Here, the source of the trouble is likely the whole idea 
of analyzing a nonlinear system as though it were essentially a linear system 
(5), with disturbance n arising from the non-linearities. This is not in itself a 
poor idea, but the present method of analysis rests, through equation (6), on 
the assumption that the non-linearity effect n is statistically uncorrelated with 
the system driving force x. This assumption is likely to be strongly violated in 
a real nonlinear situation, and the resulting estimates of the linearized system 
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frequency response substantially distorted. Thus, in the present case, the most 
conspicuous feature of the system gain, the characteristic peak near .001 cps, 
is lost in the general oscillation of the experimental results, except for a hint 
in the lowest amplitude run (Figure 3). 

The upshot seems to be that the method of analysis presented here, while 
reasonably good for linear systems, should not be relied on to give any more 
than a rather gross qualitative picture for nonlinear systems. 

As a corollary, it should be pointed out that there is a certain danger in using 
the present method of analysis even for linear systems (5), when the driving 
force x is not imposed experimentally, but rather found in the normal operation 
of the system. In chemical process applications especially, where the system 
under study is often one unit in an integrated plant, there would seem to be 
a very good chance of finding a substantial correlation between the x and n of 
(5) because of the material and energy recycles in the plant. And this correla- 
tion would, as for a nonlinear system, disturb the validity of the relation (6) 
on which’ the’ present analysis is based. 


METHOD oF CALCULATION 
This Section sketches the course of an analysis of a finite stretch of stationary 
input and output records, x(t) and y(t) of (5), aimed at estimating the fre- 
quency response characteristic of the system (5). The methods of estimation, 


and the corresponding statistical confidence statements, follow Tukey [11] 
and Goodman [1]. 

The basic quantities to be estimated from the finite records x(t), y(¢) are the 
average levels u, , u, , and the total linear trend variations 8, , 8, over the length 
of the run,'the covariance (cov.) functions 


p.(r) = cov {x(é), x(t + 7)} 

p(t) = cov {y(Z), y(t + 7)} 

Pz(7) = cov {x(Z), y(t + 7)} = Pelt) + Yar(7) 
Py(t) = cov {y(é), x(t + 7)} = pe(—7) 

Get) = 3{Peu(t) + Pr2(7)} 

Wer(7) = Ff p(t) — pye(7)} 


and spectra 


S.(f) = 2 r p(t) cos 2afr-dr 


Sf) = 2 [ p,(t) cos 2afr-dr 


Sal) = | ras niet dr = Calf) — iQelf) 


C.,(f) = 2 [ ‘ wh weds 


Q.,(f) =2 [ Vzy(7) sin 2nfr-dr 
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the frequency response function of (6) 
S.,(f) 
Lif) = = 
D= 3.0) 
with the gain and phase functions defined by (4) 


L(f) ae G(fe? ; G(f) > 0 
and the coherence defined by (7) 


v'(f) a | S.(f) | 


S.NS AA) 
The final estimates will be denoted by 2, , 8. , 6.(t), 8.0), L(f) ete., and auxiliary 
intermediate estimates by 4,(t), S,(f) ete. 
The calculation begins with the (in principle) continuous records x(t), y(t). 
One first selects a sampling interval 


At=h 
with its corresponding Nyquist frequency 
1 
fo = oF (19) 


bearing in mind that the sampled data can have no spectral power at frequencies 
above f, . Indeed, if S’(f) denotes one of the spectra of the sampled pair z, y, 
then it is given in terms of the corresponding spectrum S(f) of the continuous 
pair by 


a 


Sih = Lo SF +2); ltl <fo 


v=-@ 


so that the power density in the sampled pair at f is the sum of the power densi- 
ties in the continuous pair at f and at all its “aliases” f + 2»f, . To minimize 
the distortion due to this aliasing, the interval h must be taken small enough 
to ensure that the continuous data have no appreciable power above f, . Further, 
since the spectral calculations are made at a fixed series of points equally spaced 
in the range 0 — f, , the interval h should not be chosen so small that the whole 
spectral weight is crowded down to the bottom one or two computed points. 
In what follows, we suppose this to be done, so that the sampled data have 
essentially the same spectral characteristics as the continuous data, and there 
is an appreciable number of computed points in the spectral range of interest. 
With the sampled records (N data pairs) in hand, 


x, = x(t’ + ~ 
Yn = y(t’ + mh) 


(’ is an arbitrary time origin that plays no role in the calculations), one computes 
the sample means 


=1,2,---,N 


oe 


By 
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and total trend variations (from the averages of the top and bottom thirds of 
the records) 
ee 
Be = ON =) nts ™ ~ 2 
a N N v \ 
Pe = WN —») Lz. Ue — Di te 
vy = [N/3] 


where [N/3] is the largest integer not exceeding N/3. Estimates of the average 
slopes of x and y (with respect to the time ¢) can be recovered, if desired, as 
8,/Nh, B,/Nh. 

One next selects a resolution number RF smaller than the record length N. 
R is the number of intervals into which, for computing purposes, the funda- 
mental frequency range 0 — f, is divided, and will in general be much smaller 
than N. The interplay between R and N will be exhibited in the confidence 
statements for the estimated system gain and phase given at the close of this 
Section. 

With the resolution number F in hand, one computes crude covariance func- 
tions as average lagged products up to lag Rh in the form 


N-m 


‘ 1 
p.(mh) — N-m 2» TrXn+m 


1 N-m 


p(mh) = > —— X YnYn+m 


1 N-m 
pzy(mh) = N-m ‘ LnYn+m 


joalmd) = A DS tata 
Gz(mh) = 3{p.y(mh) + p,-(mh)} 
Vz4(mh) - 3{ 6.,(mh) ; Pyz(mh)} n= 0, 1,2, s+ ’ R 


and corrects them for mean and trend to give the final sample covariance 
functions as 


as(mi) = as(mi) — {02 + 2 rast} 


aslmk) = (mh) — {a8 + i nit} 


$2y(mh) raga Pay(mh) e {aut + = ri.) 


Ves(mh) = Var(mh) — 5 (BB, — ABs) 


2m _ 2m’? +1 


w= 1-— FT N? 
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The spectral estimation can also conveniently be organized into two stages. 
Taking R equal intervals in the fundamental frequency range 0 — f, , one first 
computes raw spectral estimates at the lattice points in the form 


8(E%) = x {6.00 +25 p.(mh) cos rmx/R + p.(Rh) eos rx} 


8 I fo) = 5 = 1 {0 +2 x p,(mh) cosrmr/R + A,(Rh) cos rx} 


.{£ fo) = 5 => {4/0 +2 7 2.,(mh) cos rmx/R + $,,(Rh) cos rx} 


@.(5 f.) = 1 2 > ¥.,(mh) sin rme/R} r=0,1,2,--- 


noting that taking r = —1 andr = R + 1 would give 


s(51 1.) = 8(4%). 
(qr) = (gt). 
¢..(=1 1) = e.(f 1) 
0.(!1,)--o.(51), o.(%S 
An averaging then gives the final spectral estimates 
8(£4) = 23 8(Z11) + .54 
(5h) - 
€.(5 f.) = 
0.(5 fo) = 


at the points f = rf,/R. 
Finally, one may identify the cross-spectral estimates as 


8.) = Calf) — Q.ff) 


at the points f = rf,/R. Then, following (6) and (4), one may compute the 
estimated gain and phase of the frequency response function of the system 
(5) from 
7 8, (f) 
Li = Ge?” =; a 20 
— 
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and, following (7), the estimated coherence between the system input and 
output as 


P*(f) ee 5.(f) |? 


= SSO om 


at the same points f = rf,/R. 

This completes the statement of the computing formulas. We conclude this 
Section with a statement of the key statistical result used in this study, Good- 
man’s approximate confidence statement for the estimated system gain and 
phase functions G, P. The statement is, at each frequency f = rf,/R, 


1 zal 7 k/2 
< sin ‘| v1- (ose ; (21) 


Prob. {| P— P| < and lg 1 





where G and P are the underlying gain and phase, and y’ the underlying co- 
herence at the frequency. The parameter k appears in (21) as the number of 
statistical degrees of freedom per spectral calculation point, and may be de- 
termined from the record length N and the resolution number RF as 


y r #0 (f # 0) 
ke (22) 
x fae. eg 


It turns out that on the computing scheme presented here only alternate spectral 
estimates are substantially uncorrelated. The number of degrees of freedom k 
is thus roughly the number of data pairs per independently resolved frequency 
in the fundamental range 0 — fy . 

It should be pointed out that the use of the confidence statement (21) calls 
for a knowledge at every frequency of the true underlying coherence y’, quanti- 
ties that one cannot in general expect to know. All one can get from the data 
are the estimated sample coherences /* of (20). Goodman gives the probability 
distribution of the sample coherence, but his formulas, untabulated, are of 
limited direct usefulness. There does not seem to be anything much better 
available to the immediate user than to look at the # computed from (20), and 
then take some conservative guess at the y’ to use in (21). This makes the use 
of (21) to assess the sampling variability of a frequency response characteristic 
estimated from a given experimental run a relatively straightforward matter, 
and is indeed what was done in the present study. 

The order of events in designing an experimental run may be taken as follows. 
First, estimate a Nyquist cutoff frequency f, from a guess at the effective fre- 
quency range of the spectral shapes that will be encountered. Second, establish 
a required resolution number FR from a guess at the jaggedness of these spectral 
shapes. Finally, assign a precision ¢ and a confidence (Prob.) level in (21), guess 
a coherence y’, and compute the associated k. This, through (22), will determine 
the number of data pairs N, and since the sampling interval h is known from 
(19), the overall clock time of the run will be determined as well. To validate 
these guesses about the spectral shapes and coherences may of course require 
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some amount of pilot experimentation and the accompanying pilot spectral 
calculations. 
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1. INTRODUCTION 


The experimental scientist is frequently faced with the task of determining 
the functional relation between a ‘response’, y and a number of ‘inputs’, 
2, , 2%, °** , %, With the help of empirical data. Often the mathematical form 
of the functional relation is assumed to be known and is written in the form of 
a ‘regression function’, 


Y = f(a, , 2, +++ 5 Ue 50, , Oe, > , On), (1) 


where the (k + m) variable function f is given mathematically but the unknown 
parameters, 0, , --- , 8, , have to be estimated from a set of observed responses, 
y, and associated inputs 2, , --- , x, . As is well known, the method of estimation 
most frequently employed for the estimation of the @; in (1) is the method of 
Least Squares:—With this method the differences, y — f, between the observed 
responses, 7, and the responses computed from the associated inputs x, , +--+ , 2%, 
through the regression function (1) are formed using a trial set of parameters 
6,, --- , 0, . The sum of squares of these differences, Q = > (y —f)’, is then 
an m-variable function of the trial parameters @, , --- , 8,, and minimized as a 
function of these parameters. 

When the regression function f is linear in the parameters the m equations 
resulting from setting dQ/00; = 0 fori = 1, 2, --- , mare linear in the @; and 
are known as the ‘normal equations’ in multilinear regression estimation, a 
special case of the well developed theory of ‘linear estimation’. 

When the regression function is non-linear in the parameters, both the theory 
and the practice of the estimation procedure is considerably more difficult. 

Most of this paper is concerned with the methodology of non-linear regression 
problems, that is, with the numerical technique of computing Least Squares 
estimates as solutions of the system of non-linear equations. After posing the 
problem (Section 2) we develop a modification of the well known ‘Gauss-Newton’ 
method of iterative solution. This new method, whilst sharing the advantageous 
features of the Gauss-Newton method, has the additional merits of a guaran- 
teed convergence (Section 3). 





* This work was sponsored by the U. S. Army Ordnance Corps under Contract No. 
DA-11-022-ORD-2732 Project No. 401-04-17-95-2633. 
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Whilst the solution of the normal equations in linear regression theory is 
usually unique and yields the absolute minimum of Q both these questions 
require special examination with non-linear problems and some of these ques- 
tions are discussed in Section 4 where a theorem on uniqueness is also proved. 
The details of the numerical procedure are then spelt out in more detail and 
illustrated with a numerical example of fitting the exponential regression with 
asymptote involving three parameters (Section 5). Certain short cut methods 
are also discussed in this section. In a later section, (Section 6), certain questions 
concerning the optimality of the Least Squares estimation are invoked from 
the properties of maximum likelihood estimation. Comparisons with the litera- 
ture are made in Section 7. The reader who is interested in the numerical pro- 
cedure only and not with the mathematical proofs is advised to turn to the 
worked example of Section 5 and check the formfflas by referring to the de- 
scription of the general procedure given in Sections 2 and 3. 


2. THe FORMULATION OF THE NON-LINEAR REGRESSION PROBLEM 


Consider the following non-linear least squares problem:—Given n sets of 
‘observed’ (k + 1)—tuples y, ; 21, , Yan, +++ » Man(h = 1,2, --- , n), 


f(x; 0) mm fla, , ++ Te 501, °° , On); (2) 


it is required to determine a set of 6; for which the sum of squares 


Q(6) = Dn — flea ; 0)? = Min, (3) 


Here the symbol x, stands for the k-vector with elements x, , --+ , X,, and 
the symbol @ for the m-vector with elements 6, , --- , 0, . The function f(z, 4) 
is assumed to satisfy the following conditions:— 
a. Introducing the first and second derivatives of f(x, 6) with regard to the 
6; as 


a4 = fz, 9); 50, 08 = file, 8), ” 


the f; and f;; are assumed continuous functions of the 6 for all k — tuples 
z,(h = 1,2, --- , 2). 
The above assumption permits the following definitions to be made:— 


—2 Di (yn — fen 5 DACs 5 9) 
a°Q 
= Q;, = —2 ae (ys — f(r , 9))fiien , 0) +2 Zz filrn , Fir , 8) 
00; 06; h h 
b. The following assumption is equivalent to the well known assumption 
of non-degeneracy of rank in a linear least square problem:—We assume 
that for any (non-trivial) set of u;(i = 1, 2, --- , m) with >> wu? > 0, 


. (= Ut(rr ; o) >0 (6) 


A=1 t=1 


for the ‘observed’ vectors 2, and for all @ in a bounded convex set S of the 
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parameter space 6, , --- , 8, . This assumption will usually be satisfied 
in practical situations. 

. The next assumption is a convenient one to ensure the convergence of the 
iterative process of solution to be described in Section 3 below:—Denote by 


Q = lim inf Q(z, 4), (7) 


where S is the complement to S. Then it is assumed that it is possible to 
find a vector »@ in the interior of S such that 


Q(x; 08) < Q. (8) 


The formal ‘Least Squares equations’ corresponding to the Least Squares 
problem (8) are given by 


Q.(z, 6) = 0, 4=1,2,---,m (9) 


using the notation (4) and (5). In the next section we shall describe an iterative 
process which will converge to a solution of the Least Squares equations (9) 
under the assumptions a to c. A sharpening of assumption b is then necessary 
to ensure that (i) the solution reached gives the absolute minimum of the Least 
Squares form (3), and (ii) that the solution is unique. 


3. THE Mopiriep Gauss-NEWTON ITERATION 


We give first a formal description of the process. Later, in Section 5, we 
discuss computational short cuts which will be seen to considerably simplify 
the numerical execution of the procedure:— 

We start with the vector ,@ stipulated in assumption c. The first step is to 
compute ‘corrections’ to the elements ,6; of the starting vector ,@. These cor- 
rections will be proportional to the solutions D; of the Gauss-Newton equations 
corresponding to (3). The latter equations are obtained in the familiar manner 
by substituting a multiple 1st order Taylor expansion of f(z, 6) at 6 = ,@ in 
(3) and writing the resulting linear least squares equations viz:— 


21D halen 5 filer 5 00)} Di = — Que; of, (10) 


where Q;(x; 9) is given by (5). Because of assumption b the determinant of 
linear equations (10) has rank m and thus can always be solved, yielding the 
elements D; of the vector D as solutions. Consider now the function 


Qv) = Qz,.6+vD), for O<v<1l, (11) 


and denote by v’ the value of v for which Q(v) is a minimum on the interval 
0 < v < 1. Defining the vector 


1,6 = 6+’ D (12) 
with elements ,60; = 9; + v’ D; we have obviously 


Q(z, 19) < Q(x, 09) < Q, (13) 


‘so that ,@ clearly lies in the interior of S. The above computation is now re- 
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peated at ,@ and so on. There results a sequence of vectors ,6,¢ = 1, 2, --- , all 
within the bounded convex set S, with 
lim Q(z, .6) = Q* (say). (14) 
to 
Consider a point of accumulation 6* of this bounded sequence and subsequence 
-6 with 
lim ,6 > 6* (15) 


TO2 


Now since clearly 


lim Q(z, 0) = Q(z, &) < QG, 0) < Q, (16) 


it follows from assumption c that 6* must be an interior point of S. We now show 
that at this limit point 6* the first partials Q; must all be zero:—Introduce the 
D* as the solution of the equations (corresponding to the equations (10)) 


2 {Ehlers ofa , *)} Dt = —Q.(z, 6). (17) 


i=1 A 
Because of the continuity assumptions and (6) 


lim ,D; = D}. (18) 


TIO 


_Further from (17) and (6) 


> Q(z, *) Dt = -2 0 (X fen 0%) Dt}? <0, (19) 


i=1 A 


provided 


m 


2, DF >@. (20) 

t=1 
But equation (19) implies that the differential of Q in the direction proportional 
to the D* is negative. Therefore, it follows from (18) that for all ,@ in a small 
neighborhood of 6* the differential of Q in the direction proportional to the 
,D; would be smaller than a fixed quantity —e (say). Since the second differ- 
ential of Q in these directions# is bounded over a unit distance by a bound of 
say B, it follows that the minimum of Q in the direction of the ,D; must be 
below Q(z, .6) by at least the amount ev — }Bv’ where v is the fractional dis- 
tance moved in the direction proportional to the ,D; from the points ,@. Choosing 
v* = min (1, | e |/B) we find that the minima of Q in the directions proportional to 
the ,.D; would all be below Q(z, ,@) by at least the amount $ev*. This contradicts 
(14) which states that the Q(z, .6) of the original sequence ¢ converge to (* 
which would also be the limit of the subsequence Q(z, ,@). Thus, we reach a 
contradiction unless >", D*? = 0 which implies (because of the full rank of 
equations (17)) that 


> Q(z, e*) = 0. (21) 


# This differential may be defined as the differential with regard to the variable v as defined 
by equation (11) with ,6 replacing 0@. 
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We have shown therefore that a subsequence ,@ of the sequence of vectors ,@ 
converges to a solution 6* of the Least Squares equations 


Q.(z, 6*) = 0, 4#=1,-+++,m. (22) 


It is clear that for practically all problems of non-linear regression the original 
sequence of ,@ will converge to @*:—For if there were an infinite subsequence of 
6 not converging to 6* then a subsequence of these ,..6 would tend to a limit 
6**  6*. From the above argument we would now conclude that 


Q(x, 6*) = Q(z, 6**) (23) 
and that 6** must also be a stationary point, i.e. 


Q(z, o**) = 0 7=1,2,--+,m. (24) 


It is highly improbable that there be a regression surface and a set of observed 


x, and y, such that Q has two stationary points yielding precisely the same value 
of Q. 


4. THe ABSOLUTE MINIMUM OF Q AND THE UNIQUENESS 
OF THE LEAST SQUARES SOLUTION 


It is clear that the above procedure will yield a solution of the Least Squares 
equations Q; = 0. It is also clear that there must be an absolute minimum of Q 
inside S. If 6* is a vector at which this occurs, clearly Q;(z, 6°) = 0. However, 
the vector 6* to which our process has converged may not be equal to 6°. It is 
therefore necessary to discuss conditions under which it does yield a 6° vector. 
Let us assume that 6° is the only vector yielding the absolute minimum of Q. 
We then have as a necessary consequence that the quadratic form 


DX Q(z, O*)uu; > 0 (25) 


is positive definite and hence will remain positive definite in a convex ‘neighbor- 


hood region’ of 6°, say S*. We now prove the following uniqueness theorem:— 
Theorem 1. 


In any convex region S* of the parameter space @ in which the quadratic form 


DX Q:i(x, O)uu; > 0 (26) 


is positive definite there can not be more than one stationary point of Q(z, 6). 
The proof is almost obvious. Suppose there were two stationary points 6’ and 
6” in S*. Consider the function 


Fo) = Dd {Q.lz, 0 + 01 — v))(% — 0%’)}. (27) 


Clearly F(0) = F(1) = 0 and hence by Rolle’s Theorem df(v)/dv = 0 for some 
value 6 with 0 < 6 < 1, so that 


De Q(x, 75 + O'"(1 — B))(8; — 67)(% — 0%’) = 0, (28) 


which would contradict (26). 
From Theorem 1 it is clear that 7f the region S involved in conditions b and c 
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TABLE 1 


Fitting of Exponential Regression by Modified Gauss-Newton Iteration Cycle 1 
Lo = 580, Bo = —180, Ko = —0.16 


4 


exp (Kor) x exp (Kor) 
2.22 554 —11.12 771 
1.61 607 — 4.84 822 
1.17 351 — 1.17 351 
.85 214 .85 214 
.61 878 1.85 635 
.44 933 2.24 665 
6.93 537 —12.19 430 


of Section 2 can be chosen to be the region S* containing the absolute minimum 
6* and comprising only vectors @ for which the quadratic form (26) is positive 
definite, then our iterative process will converge to the vector 6° yielding the 
absolute minimum of Q. In practice, the difficulty about the above proposition 
will be to find a starting vector ,@ which is known to be in the regionS*. This 
is particularly difficult when the surface represented by Q(z, 6) may have numer- 
ous local minima and/or maxima and/or saddle points. Under such conditions 
it is necessary to ‘search’ the parameter space @ at a wide grid in an attempt 
to locate a point in the region S*. If a starting vector o@ can be located in the 
region S* our method can be used to converge upon 6", i.e. the absolute minimum. 

If there is a problem in which the absolute minimum is not vnique for all 
large samples the Least Squares principle ceases to be an appropriate method 
for estimation since it will be incapable of distinguishing between the two solu- 
tions. Finally, it should be stressed that only the (unique) vector 6° which 
yields the absolute minimum of Q is of interest in statistical estimation theory 
(see Section 6). If the Least Squares equations Q, = 0 permit solutions @ other 
than 6° these do not necessarily share the properties which make 6* a desirable 
estimator (see Section 6). 


5. ILLUSTRATION BY A NUMERICAL EXAMPLE AND 
CoMPUTATIONAL SHORT Cuts 


We now illustrate the computational procedure of our method with the help 
of a numerical example. In the course of this illustration certain computational 
short cuts will be introduced. 

The data in columns 1 and 2 of Table 1 below have been taken from a ‘Ferti- 
lizer Experiment’ in which the n = 6 responses y,(h = 1, 2, --- , 6) represent 
the yields of wheat corresponding to six rates of application of fertilizer, x, , 
which, on a coded scale, are given the values x, = —5, —3, —1, 1, 3, 5. It is 
intended to fit to these data the exponential law of ‘diminishing returns’, 


Y = f(z; L, B, K) = L + Be™ (29) 


(sometimes called Mitcherlisch’s Law of diminishing returns) in which there 
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TABLE 2 


Formulas for equations for D,; , Dz and D; 


D, Dz D; 


n re Bo Does =D -/) 
=. ekoz pe e2Koz Bo > xe**o= =) (y be fye*e* 
Bo Dize™s* Bo Dixe?*o# BS LateX-* =Bo diy — fae 








is only one input variable, x, but m = 3 parameters namely 6, = L, the asymp- 
totic yield for large rates of fertilizer application, 6; = K (usually negative), 
the exponential rate of response decrease and 6, = B (usually negative), de- 
fining the mid-point response (at x = 0) by L + B. 

Clearly in accordance with (4) 


fi = (@f/AL) =1; fa =e; fs = Bae™ (30) 


and the equations corresponding to (10) can be spelt out as shown in Table 2 
below where, in usual fashion, the unknowns D,D,D, are written as column 
headings. 


Here (y — f) is the residual with f evaluated at the starting vector with 
elements L, , By and K, . In our example these trial values were taken as 


LT, = 580 B, = —180 Ky = —.160. (31) 


In Table 1 we show the stages of computations required to evaluate the elements 


of the matrix coefficients in Table 2:—Columns (1) and (2) show the original 
data x and y, column (3) the products Koz, column (4) exp (Koz) from a suitable 
table of the exponential function (the decimal accuracy is more than is required 


for the initial cycles), column (5) the products x exp (Kor) and column (6) 
the residuals 


y — f =y — 180 exp {Kor} = (2) — 180 (4). 


It is clearly convenient to divide the last row and column in the matrix of Table 
2 by B, and thereby obtain D, , D, and D,B, as solutions. All elements in Table 2 
are then seen to be either the totals of columns (4), (5) and (6) in Table 1 or the 


sum of squares and cross products of these columns. They are set out in Table 3 
below. 


The solution of these equations yields 


D, = —89.6838, D, = 58.8877, D; = —.063 117. 


TABLE 3 
Cycle 1, Equations for D; , Dz and D;/Bo 





Dy D2 BD; 


6. 6.935 37 —12.194 30 —267 .634 
6.935 37 10.252 765 —31.093 050 —370.782 306 
—12.194 30 —31.093 050 157.927 907 1055.627 355 
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TABLE 4 
Cycle 1, Computation of Starting Values for Cycle 2 


v=1 v=} 
Kz exp kz y—f Kz exp Kz y—f 


1.115 585 3.05 135 6.140 .957 795 2.60594 — 15.868 
.669 351 1.95 297 —102.888 .574 667 «1.776 56 —116.736 
.223 117 1.24 997 39.971 .191 559 =1.211 14 26. 136 

— .223 117 .80 002 27.476 —.191 559 -825 67 10.102 
— .669 351 .51 204 31.598  —.574 677 -562 89 9.538 
—1.115 585 .82 772 — 24.725 —.957 795 .383 74 — 51.434 


The next stage, in accordance with equation (11) would be to find the minimum 
value of Q as a function of v by substituting 


L=L+vD,, B=B,+0D., K=K,+0D,; (32) 


in f and hence Q, which then becomes a function of v. In order to find the mini- 
mum of Q we proceed by an approximate method:—We evaluate Q for 
v = 0,» = 3} and v = 1 and determine the level of v for which the parabola 
through Q(0), Q(3), Q(1) attains its minimum from* 


Ymin = & + 2(QO) — QM))/(QM — 2Q(2) + QO). (33) 


In this formula Q(0) = >> (y — f)’ can be evaluated directly from the (y — /) 
values given in column (6) of Table 1. We obtain Q(0) = 27376.825. The compu- 
tation of Q(#) and Q(1) requires the evaluation of the exponential regression for 


Lo + 3D, = 535.2081, By + 4D, = —150.55615, Ky + 2D; = .191559 
and 
Ly + D, = 490.4162, B, + D, = — 121.11230, Ky + Ds; = — .223117. 


This work is shown in Table 4 below. We obtain Q(}) = 17400.6578 and 
Q(1) = 14586.01079 and accordingly from (32) the fraction v,;, = .99651. 
Thus the minimum of Q on the line determined by (32) occurs at 


L, = 495.207,  B, = —124.2621, K, = —.219741. 


The process is now repeated with L, , B, and K, taking the place of the starting 
trial values L, , By and Ky . In Table 5 we give the four cycles required to con- 
verge upon the absolute minimum of Q. The final least squares estimates are 


L = 523.3, B= —-—157.0, K = —.1994. (33) 


* An attractive alternative suggested to me by Dr. K. Ruedenberg is to find the minimum 
of the parabola passing through Q(0) and Q(1) with the negative slope S at v = 0 given by 
(19). This minimum occurs at min = —S/(Q(1) — Q(0) — S) 2. In any case the computer 
program should always test whether the new minimum Q(vmin) is actually smaller then Q(0) 
and if this check fails, should redo the computation on a segment of half-length. 
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TABLE 5 
Convergence of Least Squares Estimates in 4 Cycles 
of Iteration with Starting Values (31) 


L B K D, Dz Q(0) 


Cycle 1 580.000 -—180.000 .160 000 —89.5838 27376 .82 
2 495.208 —124.262 .219 741 33 .0685 14590. 58 
3 524.960 —159.273 —.190 648 — 5.7328 13639. 10 
4 519.422 —152.488 —.203 545 5.2393 13394 .35 
Final 523.3 — 157.0 — .1994 


In order to illustrate how in this example the convergence depends on the 
choice of the starting values, we have set out in Table 6 the analogous informa- 
tion to Table 5 for the starting values: 


L, = 500, B, = —140, K, = —0.18. (34) 


These starting values, used merely for purposes of illustration, are actually 
‘closer’ to the final estimates than the previous ones. For work with desk com- 
puters it usually pays to compute fairly good starting values from short cut 
estimation procedures. For work on high speed computers it is sometimes more 
convenient to avoid a special program for the computation of good starting 
values and consequently perform a larger number of cycles in the iteration 
process. It will be seen that in the present example with the starting values (34) 
after four cycles virtually the same final answers as in (33) are reached. 

It is of interest to check whether the condition (6) is, in fact, satisfied which 
would mathematically ensure the convergence of the procedures just found to 
converge to the same limit in our numerical illustrations. Without loss of gener- 
ality condition (6) can be written in the form 


S= >> uy + we™™ + use ™) > 0 (35) 
A=1 


for any set of u, , Up , Us With ul + uy + uy = 1. 
Now S = 0 is clearly only possible if each of the quantities inside the ( )’ are 
zero. We now show that this is not possible if there are at least three different 


TABLE 6 


Convergence of Least Squares Estimates in 4 Cycles 
of Iteration with Starting Values (34) 


L B K D, D: D; Q(0) 


Cycle1 500.000 —140.000 —.180 00 12.325 — 4.389 .035 83 18428.00 
511.156 —143.972 —.212 50 15.866 —17.516 -018 56 = =13433.10 
521.040 —154.884 —.200 94 2.734 — 2.618 -001 826 13392.94 
523.805 —157.532 —.199 09 — 0.717 -83976 —.000 817 13390.23 
523.3 —156.9 — .1997 
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values of x, among the n observed 2,-values. In fact, the equation 


u, + ue**™* + ugre*** = 0 (36) 


for any set of given K, u, , U2, Us With ui + u3 + uj = 1 has at most two roots 
x. To prove this write (36) first in the form 


g(x) = (Us + ust) + we“? = 0 (37) 


and observe that g(x) = 0 can not have more than two distinct roots x. For if 
it had three or more distinct roots there would be at least one root of 
d’g(x)/dx® = 0 which is not possible since d’g/dx? = u,K*e~**. (Note that if 
u, = 0, g(x), the equation of a line, has at most one root if uj + uj = 1). 

It should be pointed out that following standard regression analysis the three 
equations in Table 2 can be immediately reduced to two equations by elimi- 
nation of D, with the help of the first equation thereby reading sums of squares 
and products of deviations of y — f, e“** and xe*** as coefficients for D, and D, . 

Computations, not here presented, would show that in the present example 
the standard (unmodified) Gauss-Newton method would have worked quite 
well so that the benefits derived from our modification are not here apparent. 
They will, however, be considerable in situations in which the Gauss-Newton 
method does not converge. In the present example the differentials (30) of 
f with regard to the parameters are comparatively simple functions. How- 
ever, these differentials frequently are analytically involved expressions 
whilst the function f(z, , --- , % 3 0: , °°: , 0) itself is easy to evaluate. In 
such situations one may prefer to evaluate m + 1 sets of f-functions, namely 
those for the ‘change of one parameter at a time’ pattern; 


29: = 09, + 6, A; f,7 = 1,2, --+ ,n, (38) 


where 6,, is the Kronecker 6 and A is a conveniently chosen small increment. 
The differentials of f with regard to 9, are then computed from the difference 
ratios. This method is particularly convenient on high speed computers as it 
only requires a program for the evaluation of the regression function f itself. 


6. STATISTICAL PROPERTIES OF THE LEAST SQUARES ESTIMATORS 


We confine ourselves here to certain results which follow directly from the 
properties of maximum likelihood estimation. Let us assume then in this section 
that the observed responses y, result from the model 


Yn = f(tn , oes > Tan 5 O,, °°° +a, 


where the ‘error residuals’ are independent normal variates with mean zero 
and variance o°. Under these assumptions the Least Squares estimators are 
identical with the maximum likelihood estimators. Certain results by A. Wald 
(1949) can now be applied to the present situation. Wald proved under very 
general continuity conditions that the parameter vector @ which yields the 
absolute maximum of the likelihood is a consistent estimator of the true param- 
eter vector 6 . This result, taken in conjunction with Cramer's (1946) results 
on consistent solutions of the maximum likelihood equations implies that the @ 
which yields the absolute maximum of the likelihood is fully efficient. Moreover, 
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since a consistent solution of the likelihood equations is unique it is very un- 
likely that there will be situations in which any solutions of the likelihood equa- 
tions (i.e. the Least Squares equations Q; = 0) other than @ will be fully efficient 
or, indeed, have any desirable properties for serving as estimator of 0, . Since 
we have merely shown that under the very general conditions a, b and c of 
Section 2 our process will converge to a solution of the Least Squares equations 
Q; = 0 it is clear that it is necessary either to examine the uniqueness of this 
solution or, in case there is not unique solution, to take such precautions as to 
ensure that the absolute minimum of Q is attained by the process. 


7. SomE COMPARISONS WITH ALTERNATIVE 
Least SQuaRES TECHNIQUES 


Only a few points of a general nature can be made at the present time. As 
experience in the use of the method is accumulated in various fields of applica- 
tions its advantages and disadvantages will become better substantiated. 

Compared with the well known method of steepest descent each step of the 
present procedure is computationally more elaborate. With the former the di- 
rection of movement in the parameter space is made proportional to the Q; . 
That is, only the first derivatives of Q enter into the trial and error approxima- 
tion of the surface Q. By contrast our procedure (as does the Gauss-Newton 
method) employs a first order approximation for the residuals whose squares 
comprise Q, and is therefore based on a second order approximation to Q. It 
should be noted that this approximation is not identical with the second order 
Taylor expansion of Q except in the special case when all second derivatives 
d°f/30,00, are zero. In general the second order approximations will differ slightly 
and there has been some discussion in the literature (S. L. Piotrowsky (1948) 
and H. Wolf (1951)) as to their relative merits. It is, however, generally agreed 
that the number of iterative steps with a second order approximation to Q 
should be smaller and the unmodified Gauss-Newton method, if it does converge, 
will usually converge faster than the ‘steepest descent’ method. 

It should be also pointed out that other modifications of the Gauss-Newton 
method have been attempted in the past notably the one by Levenberg (1944). 
However, all these methods require to a varying degree judgements which depend 
essentially on an advance knowledge of the higher differentials of Q in the near 
solution region. Sometimes such knowledge can be accumulated as the parameter 
space is explored in the course of the iteration, but such procedures do not lend 
themselves well to programming on modern computers. 

Finally we should briefly mention the relation of our method of Least Squares 
solution to modern procedures of finding maxima (or minima) of statistical 
response surfaces subject to observational errors. Although there are certain 
features common to the two problems, there are essential differences in the 
economical aspects of different operations in the two problems. For example, 
with Least Squares work the evaluation of Q for a new point in the parameter 
space is a comparatively ‘cheap’ operation whilst the analogous determination 
of the response surface Q requires the often costly evaluation of the response 
for a new point in the ‘factor space’. It is not surprising therefore that the most 
economical ‘strategy’ in the two problems will differ. A strategy suitable for 
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response surface work and based on quadratic programming has recently been 
sketched by E. M. L. Beale (1958). Earlier G. E. P. Box and K. B. Wilson (1951) 
had introduced the method of steepest ascent into this area and initiated a 
series of papers examining the capabilities and limitations of this method in 
industrial response problem. 

It was due to the fact that I had a preview of the above report by E. M. L. 
Beale (1958) prepared by the 8.T.R.G. at Princeton that I failed to malze in- 
quires from members of that group concerning other memoranda on the topic. 
It was not until after submission of this manuscript that my attention was 
drawn to the existence of an IBM Share program written in February 1959 by 
G. W. Booth and T. I. Peterson under the guidance of G. E. P. Box and M. E. 
Muller as a cooperative efford of the Statistical Techniques Research Group 
(Princeton), Department of Mathematics (Princeton University) and the 
Mathematics and Applications Department, Data Processing Division, IBM. 
This program, although it differs from the present procedure in detail, does in 
fact employ the same basic idea of Moving in the direction of the Gauss-Newton 
solution of equations (10). Moreover my attention has been drawn to a mention 
of this idea earlier in a paper by G. E. P. Box (1958) p. 220 lines 13-18. Finally 
reference should be made to a published abstract of a paper by M. B. Wilk 
(1958) in which a different modification of the Gauss-Newton method is pro- 
posed. The above references have now been incorporated in the list which follows. 
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On the Possibility of Improving the Mean Useful 
Life of Items by Eliminating Those 
with Short Lives 
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Durham, North Carolina 


When everything possible has been done to produce articles with long lives, there 
remains the possibility that a further improvement in the articles may be obtained 
by running them, for some time, under realistic conditions. The fraction that does not 
fail may have a longer mean remaining life than the original articles. In this paper 
conditions on the life distribution of the original articles are found which will insure 
this. The Weibull, gamma, exponential, extreme value and log-normal life distribu- 
tions are examined in detail. The most interesting case is the log-normal, for which it is 
always possible to increase the mean life to any extent desired by continuing to test 
until a sufficiently large number of articles have failed. 


I. INTRODUCTION 


The quality of many manufactured articles (e.g., light bulbs, vacuum tubes, 
transistors) may be taken to be the length of time they give satisfactory service, 
i.e., the “lifetime”. Having improved the manufacturing process as much as 
possible, the only way to increase the quality of the product sold is to remove 
from the output those items whose lives are short. 

This may often be done by tests and inspection procedures which do not 
consume any of the life of the article. If criteria are set up in this way, a certain 
fraction + of output will fail to pass. If the items in this ‘fraction defective” 
have a mean life of u, , and the remaining items have a mean life of u. , then 
the criteria used should ensure that uy, is decidedly larger than yp, . The mean 
life L of the output is 


E(L) = mm, + (1 — =). 


Thus, testing has, at the price of rejecting a fraction + of the output, increased 
the mean lifetime of the saleable product from E(L) to pu. . Unless py, is con- 
siderably larger than yu, , no substantial improvement is made unless the re- 
jection rate is high. Procedures of this kind are invariably used to eliminate 
an obviously inferior product which has a very short mean life. 

If the mean life of the fraction of the output (1 — 7) that passes inspection 
is still not satisfactory, it may be that the testing criteria do not test some vital 
aspect of the items which is only stressed by actual operation. A test which 
involves actual operation of all items produced will consume some of the life 
of the items tested. The question then arises of whether the elimination of weak 
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items compensates for the life used on the test bench. If the items fail because 
of truly random effects, it is clear that no amount of initial operation will alter 
the quality of the product. This is the case when lifetimes are exponentially 
distributed. 

Thus, this method will not be successful for all life distributions. And even 
if it is successful, it may be a very costly method if it leads to the rejection of 
a large fraction of the product. This, of course, depends on the relative costs 
of testing and rejection in the factory and of failure in the field. Failures in 
the field may well be so expensive that a ruthless policy may be justified in the 
factory or in the acceptance procedures. 

Thus the object of this paper is (i) to find general conditions on a life distri- 
bution so that the mean remaining life of articles, operated for some fixed test 
period, is greater than the original mean life, (Section 2); (ii) to examine well- 
known life distributions—Weibull, gamma, log-normal, extreme value—in the 
light of the results of (i), (Section 3); (iii) to relate numerically possible increases 
in the mean remaining life to the fraction of the output rejected in order that 
an economic assessment may be made of the value of the method in practice, 
(Section 4). 


II. THEorRY 


Suppose that the product, after any kind of inspection to increase the quality 
that does not consume any life of the product, has a lifetime distribution with 
a density function f(#), when operated in a certain well-defined manner. Thus, 
if LZ is the lifetime of a randomly chosen item, 


Prob (L < #) = [ f(a) de = F(t). (2.1) 


Since 0 < L < ~, F(O) = 0, F(~) = 1. F(®) is the life distribution function 
with, by assumption, a finite mean y, defined by 


fo = [ ” afl) de. (2.2) 


Now suppose that every item of the product is put in operation in the standard 
way and run until either the item fails or a time T elapses, whichever occurs 
first. A fraction F(T’) of the product will, in the long run, fail, and the probability 
density of the remaining lifetimes s in the fraction 1 — F(T) that does not fail 


will be 
T 
w= fe, 920. 23) 


Thus the mean useful life, ur say, is given by 


BG) = [sox as, 


[ sf(s + T) ds 


rn 
When T = 0, wr = fe. 
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Then the general problem is: given such a life distribution, does there exist 
a T > 0 such that 


Mr > Ho (2.5) 


and if so, what is the value of F(T) then? Since F(T) increases monotonically 
with 7’, the practical problem when such a T' exists is: what is the smallest 7’ 
for which yz is “substantially” greater than po ? 

Intuitive considerations suggest that if f(t) has a high spike near the origin, 
a T such that ur > po should exist just to the right of this spike. Again 
if f(t) —> 0 sufficiently slowly as t — ~, some large value of 7’ should exist for 
which ur > wo . Furthermore, it might be thought that simple conditions for 
Lr > Mo Might be found in terms of the conditional failure rate at time ¢ (compare 
with (2.3)) 

{() 
h() = i- FO (2.6) 

Defining 


H(t) = [ : heave: (2.7) 


we have 


i = 1 — exp (—A(d), (2.8) 


(t) = h(t) exp (—H(). 


Since h(t) > 0, all ¢ > 0, H(t) never decreases as ¢ increases. Moreover since 
F(t) > last > ©, H(t) > © ast — o. Then it is easily shown that 


ur = exp (H(D)) [exp (-H(t + 7) dt, (2.9) 


— [ ” exp (—H(i) dt. (2.10) 


Distributions in which the hazard is non-decreasing have been called sub- 
exponential by Tukey (1958). Knight (1959) has examined this class of distri- 
butions in detail. His motivation is the plausible notion that all life distributions 
ought to belong to it. The basis of the present paper is that this is not necessarily 
so. However the following results, due to Knight, are relevant to our enquiry: 

(i) if the life distribution is subexponential, ur < uo, 

(ii) for any distribution 


thr — yeh(T) — 1, 


(iii) if wy never increases for all 7’, the coefficient of variation of the life distri- 
bution cannot exceed unity. 
lor our purposes, Knight’s result (i) is best written: 


if h(t) steadily decreases for all t, ur > wo for all T, 


if h(t) never decreases for any t, ur < wo, all T. 
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While result (2.11) is stronger than is necessary, it does decide the issue for 
most well-known life distributions. The conspicuous exception is the log-normal 
distribution. If, as is usual, h(é) has a derivative, (2.11) depends on the sign 
of h’(t). But from (2.8) it is seen that 


WO = ag ro + FS} (2.12) 


The only part of (2.12) that can be negative is f’(). Thus, h’(é) can be negative 
only if f’(t) is negative. In particular, if h’(t) is to be negative for all ¢, as in the 
first line of (2.11), f(t) must have its mode at the origin. Further, for a unimodal 
distribution like the log-normal with f(0) = 0, f’(¢) is non-negative for all values 
of t up to the mode and non-positive thereafter. Since 


(mode)” 
[1 — F(mode)]’ ’ 
h’(t) will never become negative for values of ¢ smaller than the mode. 


From Knight’s second result, it is clear that a longer mean useful life than 
that for 7 may be found if at T 


h'(mode) = (2.13) 


h(T) > x. (2.14) 
Mr 


This is a natural condition, for if an event occurs in Bernoulli trials with prob- 
ability p, its mean recurrence time is 1/p, so that the inequality (2.14) suggests 
that lower failure rates will be found for larger values of 7’. 

To find conditions under which yu; > po will always be true for some small 
values of 7’, it is only necessary to examine du;/dT at T = 0 since yz is a con- 
tinuous function of 7’. But 


[ar] = pof(0) — 1, (2.15) 


so that the necessary condition is 
(0) > +. (2.16) 
Mo 
This condition is trivially satisfied for any density that is infinite at the origin. 
Conversely if f(0) = 0, very small values of 7 will never make pr > po - 
To show that yu, will always be greater than uo for sufficiently large 7’ if 


f(t) — 0 (ast > ©) slowly enough, it suffices to assume a simple form for f(t) 
when ¢ is large. Thus it will be assumed that 


f® = + 0(£) as [7 @, (2.17) 
(Pareto’s distribution, often mentioned in relation to income distributions, has 


the density function 
— 1/(th&\* 
KO =~ Gy. ss ilies 


0, ‘<= sh. 
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For po to be finite, it is necessary that a > 2. It may be shown that 


Qa 


— 
pr = <—5T +0(7) -T 


- 

Since (2.18) tends to infinity as 7 tends to infinity, uy can certainly be made 

greater than yu» by making T7' large enough. 
If, for some 7’, ur > uo , the fraction of the product surviving the test and 

having the mean useful life uw, , is given by 
Fraction surving test = e~"‘”. (2.19) 

A comparison of (2.19) with (2.9) shows that 

[Fraction surviving test][mean useful life] = [3 e~"“'*” dt. (2.20) 


But the integral in (2.20) clearly tends to zero as T — o, since H(t + T) > ~, 
for every t, as T’ — . In other words the mean useful life increases more slowly 
than the fraction surviving decreases. This puts a limit on the possible value 
of this testing procedure. 

Finally, the actual distribution of useful lives L (of the surviving fraction) 
is given by 

Prob (L < ) = 1 —e@ tP-tl COOS tS @. (2.21) 
If h(t) is non-increasing, this probability is less than 
Prob (L < ¢}T = 0) =1—e7™, 


Thus the improvement in the mean is not obtained by the retention of a small 
fraction of very long-lived items; there is an over-all improvement in the life 
distribution. As a check on the symmetry of the distribution of the surviving 
fraction, it is useful to use the fact that 


Prob (L > wr) > & 7”. (2.22) 


III. ParticuLar Cases 


For the exponential distribution, f(t) = e~', h(t) = 1, H(t) = t, and wz is always 
equal to uo . Thus no initial testing will change the mean useful life. Here and 
in the examples below, all distributions will be taken in their simplest forms. 

For the Weibull distribution, 

f(t) =p? ce”, (p>0,0<t< @) (3.1) 
so that 


A(t) = P. (3.2) 


In this case, the conditional failure rate steadily increases as ¢ increases if p > 1. 
If p = 1, this is the exponential distribution. If 0 < p < 1, the conditional 
failure rate steadily decreases as ¢ increases. Thus, the application of (2.11) 
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immediately gives the results. Thus for the Weibull distribution, 
<1, stn, Ot TS 8, 
p=1, pwr=w, all T>0, 
p>1, wur<w, all T>0. 
For the gamma distribution, 


pigs 


{® = Te’ i> 0. (3.4) 


In this case, h(t) and H(t) are difficult to discuss analytically. It is, however, 
possible to settle the question from first principles. The mean life time of the 
distribution (3.4) is uo = k, and 


sien f° fhe! u/[ ite"! ar] _T. 


Elementary analysis shows that 
wr>w if k<1, 
Wr= pw if k=1, 
Ur<pm if k>1. 


These conditions are very similar to those for the Weibull distribution. 

Extreme value distributions depend on the tail behavior of the parental popu- 
lation. If it is exponential, one is led to what is often called the extreme value 
distribution where 


—t_en! 


ee 
e—1l 


f() = » Osis @) (3.6) 


et 
e—e° 


F() = (3.7) 


—t em? 


h(t) = £*—. (3.8) 
e —l1 


To apply condition (2.11), it is necessary to examine h’(t). Putting u = 


, dy e" ee e" fs ue" 
WO = a 
which is always negative since u is positive for 0 < t < . Thus for the extreme 
value distribution (3.6), the conditional failure rate always decreases and ur > Ho 
for every 7’. 

In this case 4/po increases monotonically to a limiting value. It is easy to 
show that 


i u) 


© 1 as 


i LatDintl 


e-T ’ 
e —1 





IMPROVING THE MEAN USEFUL LIFE OF ITEMS 


— 1 1 
LatDinti 


Mo = Ps 


T 


As T > ~,e* —0s0 that ur — 1 and pr/pp > 1/. 


The final case to be examined is the log-normal distribution. A detailed discus- 
sion of this distribution has been given by Aitchison and Brown (1957). Here 


it is assumed that 
log, ¢ is N(u, 0’). 


The mean and variance of this distribution are given by 


ie rr re 
Ho = € ’ 


Var (t) = e**”"(e"" — 1). 


Coefficient of Variation = Ve" — 1. 


If o is large, 


Coefficient of Variation = e”””, 


= Bo/ -. 
The median of the distribution is e”. 
If 


0) = Fae”, 
ae) = [ ow) dy, 


this means that 


{() = o(eei=#) Se. 


Co lo 


Differentiating (3.13) and equating f’(t) to zero shows that 


mode = é 


Thus the curve (3.13) has the form given in Figure 1. 


2 +o 
eve” ef ef * 


Figure 1—The lognormal frequency function. 


(3.9) 


(3.10) 


(13.1) 


(3.12) 
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Moreover 





= 1 o2/2—p - 

f(mode) Jae ; (3.15) 

Thus, for fixed yu, an increase in o moves the mode towards the origin and 

heightens it (if ¢ > 1) while the mean increases. The important fact is that 

these movements are exponentially fast so that the curve rapidly becomes 

extraordinarily skew. By analogy with the general result of Section 2 of an 

infinite mode at the origin, one would guess that ur > wo for some small T if « 
is large enough. 

From (3.13), it is easily shown that 


F(t) = (et — a) | (3.16) 


and 


o(tee! - t) 1 
h(t) = nic cone Mitt (3.17) 
‘e 0 (128 t— x) 

oC 
Hence 


ny . L- DY — ¢2) + ¢ 
7 fo'(1 — $)’ 


and since ¢'(z) = —z2¢(zx), the substitution ko” = log t — u gives 


yp, = lke) = 1 = @(ba)I(I + ok IO(be) a 


o*[1 — &(ko)] 


If k < 0, h’(é is certainly positive. This is a general consequence of (2.12). 
If k (i.e., t) is sufficiently large, h’(t) is positive since the numerator of (3.18) 
is asymptotically —1/c¢?(ko). When o is large, h’(t) = 0 when k~ o°}, ice., 
h’(t) < 0 almost as soon as the median (k = 0) is passed. Thus, for no pair of 
values (u, o”), is h’(t) one-signed for all ¢ and condition (2.11) cannot be applied. 
Hence, with only the present theory, the log-normal distribution must be ex- 
amined numerically and this is done in Section 4. Some guidance, however, may 
be obtained by using well-known asymptotic results for the normal distribution. 
The mean useful life is given by 

[oli 

r o to 


ie ee | ee (3.19) 
‘ o (128 T - “) 


oC 

Setting 7 = exp (u + Ko’), it may be shown that 
Pr 1 — &(K — 1)o) — ofk-pe 
Ho 1 — (Ko) y 


The analytic problem is then to study the graphs of the right-hand side of (3.20) 
against K for various values of oc. 


(3.20) 
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As T > 0, K — —®, for u, o° fixed. When K is large and negative, 
set K = —K’. Then, as K’ > ~, 





me 1 —K’292/2 
= O((K — 1)c) ~l1l- i ee gern 
Vn (K + le 
so that 
OP jf ae dee ee gs 1, 25 —(K'+1)%09/2 = (K'+pe* 
Ho , V Qe re" * /2n (K’ + lo ° e , (3.21) 
<1. 


Thus r/o — 1, from below, as T — 0. 
As T — «, K — o, the behaviour of u7/y requires the use of more terms 
in the asymptotic expansion of ®;asy— ~, 


| aes 1 3 
1— @ ~ erat 44 §_...], 3.22 
) V 28 Se 2 ¢ _ 
Using (3.22), it is found that, as K — ~, (orasa — @) 
1 1 
eens aie semen ie 06s 
Br ow gk be K — 1 (K nd 1)*o" a Phe (3.23) 
Ko 1 1 
Kk KT 
Thus, for a fixed o” 
Mr cx-pe} Ly 1 so] oo 
Ps e E + KE + as K—- o, (3.24) 


Hence, for fixed o”, u7/uo can be made as large as is desired by increasing K 
sufficiently. 
For K fixed and greater than 1, and ¢ > , (3.23) gives 
Br —, ,(k-de* 1 Sass (2K a i a+ 
ee E =i mE-De (3.25) 


Hence, for any fixed K > 1, ur/uo can be made arbitrarily larger by making 
o large enough. 
IfK = lands o, 





1 o?/2 


Er a eee 
1 — &o) F 


is 
Ho 2 


Br onl - 1] (3.26) 
Ho 2 


so that for K = 1, ur/pp > © asa —> @. 
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If0< K < l,andc— o, 


1 1 pain 1 3 | 
ur Vere KL = KP FO Ket! 
Ho KEG o a a0 ae A. XY Sra. | 

\/ on Ke ée 1 K?o" + K‘a* 


“ ~ VinKoe™"? — eK", (3.27) 
0 
Since 4(K — 1)? > 0 for all K, K*/2 > K — 3} for all K and therefore 
for0 < K < 1. Thus, ur/po > ©, aso ~. 

When K = 0, anda —> © 


Mr 1 1 wea) —}ho? 
tap Eten) 
Ho V 24 7 r = 


a ert + 22) , (3.28) 
Ho To. 


so that ur/po > 2asa— ~, 
When K < 0, and ¢ > ©, pr/p is given by (3.21) so that uz/uo — 1, from 
below, as ¢ —> ©. 
These results suggest that the family of curves for ur/uo against K for an 
increasing sequence of values of o have the following properties: 
(i) All curves go into (— ©, 1) from below, 
(ii) For larger o, more of the curve for —© < K < 0 is under, but 
closer to, the line u7/uo = 1 than for smaller o, 
(iii) For larger o, the curves rise more steeply from just to the left 
of (0.1) to just under and nearer (0,2). 


FIG. 2 
WEIBULL, p= 4 





#t/ Ho vs. T 
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FIG. 3 
WEIBULL, p= $ 


NOI Loved 





It will be noted that 


mode corresponds to K = 
median corresponds to K = 0, (3.30) 


mean corresponds to K = 3. 


Thus the median is the most critical point. However, the above results give 


no indication of the value of K, for each oc, at which the curve crosses the line 
Br/ Ho = 1 


IV. NuMERICAL RESULTS 


The condition on the parameter p of the Weibull distribution necessary in 
order to obtain a longer mean useful life is given by (3.3). For the case p = 
Figure 2 shows the relative improvement R(i.e., R = ur/po) plotted as a function 


10 


Fis. 4 
WEIBULL, p=2 


Br/Ho wt 
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FIG. 5 
| WEIBULL, p=2 
A  eeenrenmenneendl 


ATION < Br 


of the truncation or testing time 7’. This graph shows, for example, that choos- 
ing T = 1 gives R = 2 which means the new mean useful life is twice the origi- 
nal life. The amount of the original population which remains, in the long run, 
after the units have been on life test for time T is given by Figure 3. Stopping 
at 7 = 1 to obtain an R = 2 means that in the long run there will be about 
36% of the original population surviving. 

Figure 4 shows the relative improvement factor for the Weibull distribution 
with p = 2. From (3.3), the new mean useful life is less than the original mean 
life and hence, R is always less than one. Stopping at T = 1 for this case gives 
an R of .38 so that the the new mean useful life is approximately 1/3 of the 
original life. Figure 5 gives the fraction of the original population surviving at 
time T for this case. 

From these figures, it is seen that for the Weibull distribution where p is 
such that an improved mean useful life can be obtained the accomplishment of 
this can be costly in the number of units that must be thrown away. It is also 
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seen that this sort of test will result in a large reduction in the mean useful life 
if the parameter p is greater than 1. 

Reference (3) gives some sample values of the parameter p for the Weibull 
distribution obtained from life tests of electronic tubes. The values of p range 
from 1.3 to 1.9. From the conclusions drawn here, lots of this type of units 
could not be improved by a test of the type described here. Values of 0 < p <1 
indicate a high failure rate for small ¢ and hence, one would expect to obtain 
an improved mean useful life by eliminating the weak members of the population. 

The situation for the gamma distribution is much the same as for the Weibull 
distribution. Figures 6 and 8 give the relative improvement FR for k = 3 and 3 
respectively and Figures 7 and 9 give the fraction of the population remaining 
after testing to time 7’. The gamma distribution has been used as a life model 
in (2) and in other places. 

It has been shown that for the extreme value distribution that an improvement 
can always be obtained for 7 > 0. The amount of improvement is shown in 
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Figure 10 and the fraction surviving is shown in Figure 11 for the extreme value 
distribution. Figure 10 shows that u,;/uo does approach the asymptotic value 
1/po as shown before. 

For the case of the log-normal distribution, the conclusions are not as general 
as for the other distributions considered. Figure 12 gives the relative improve- 
ment factor as a function of K where K determines the value of 7' by the relation 
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The point K = 0 corresponds to the median of the log-normal distribution. 
From Figure 12 it is seen that for ¢ > 1, some very large improvement factors 
can be obtained by testing for a time which corresponds to a K > 0 (i.e., the 


FRACTION OF -NEW 
POPULATION $ py 





IMPROVING THE MEAN USEFUL LIFE OF ITEMS 


NOILOVES 


FRACTION OF 


50% point of the distribution). The fraction of the original distribution which 
remains after the test is given in Figure 13 for the corresponding values of K. 
Figures 13 and 14 give the quantities as a function of 7’. 

As an example of the interpretation of these graphs, consider the case ¢ = 1, 
u = Oand K = 2. This means that testing until time 7 = e’ ~ 7.4 gives an 
improvement factor of R = 2.5 but only about 2% of the original population 
remains at that time. However, if o = 2, 1» = 0 and K = .5, one obtains an 
R = 4.3. This means testing for a time T = e’ ~ 7.4 and about 16% of the 
original population survives the test. Some comparative numerical values are 
given in Table I. 

The conclusion for the log-normal is that relative improvements of any size 
can be obtained, but the value of o determines how costly this improvement 
will be. Since R seems to be very sensitive to the value of o it would be very 
desirable to have a good estimate of o before deciding to make a test of this type. 


TABLE [ 
The Effect of o on the Log-normal Distribution 


Fraction Fraction of new 
} ur/po Surviving Test Population <yur 


.49 45 64 
1.14 .40 .69 
2.66 ol 815 
4.30 .23 885 
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Book Review 


A SratisTIcCAL MANUAL FOR CHeEmists, by E. L. Bauer, Academic Press, 1960. 





According to its preface, this manual was written “for chemists who perform experiments, 
make measurements, and interpret data. - - - The book is intentionally elementary in content 
and method. It is not meant to be a complete text on statistical techniques, but rather a 
mahual for the working chemist’. 

Most manuals are for workers skilled in the subject matter with which each deals. Manuals 
for beginners usually confine their recommendations to solutions of simple problems and to 
relatively fool-proof methods. This manual falls in neither category. Indeed, considering its 
serious omissions, its frequent errors and its multitude of misprints, it is doubtful that there 
are any working chemists who can use the book with profit. 

There will be little point in giving a full list of omissions, errors and misprints for readers 
who are not statisticians. But the chemist who is interested might ask a statistician of his 
acquaintance what he should think of an introductory manual on statistics for chemists which 
did not mention random variables, did not distinguish between expectations and sample 
averages, failed to make the distinction between statistics and population parameters in 
general, only gave the formula for the standard error of a mean once erroneously, and did 
not mention the central limit theorem. These are omissions that strike the eye in the first two 
chapters on Fundamentals and on The Average. 

Errors of commission are almost too numerous to classify. The author appears to be under 
the impression that the normal distribution “is the theoretical distribution of the relative 
frequency of a large number of observations made on the same object’’. Using a symbol not 
in his glossary of symbols, he gives a formula for the population standard deviation that is 
hardly remediable; a limit sign, a summation sign and a definition of the symbol n are missing. 
The accompanying figure uses another symbol for the population mean, one that is indeed in 
the glossary, but defined for another use. 

The section on Average Deviation uses the letter X to mean a deviation from the sample 
mean in its first line, and then changes without warning to using the same symbol for the 
value of a measurement in the formula given. The sum of the deviations from mean as given 
in the formula is identically zero. Some symbol should be used to indicate that the absolute 
values of the deviations are to be summed. The author’s final remark, ‘“The average deviation 
is not an accurate measure of precision because it gives a bias to the measurements, making 
them appear more precise than they really are.’”’, is a misunderstanding. The fact that the 
sample average deviation is not an unbiased estimator of the population standard deviation 
is hardly an argument against its use by this author, since he uses the sample range through- 
out the rest of his book to estimate population standard deviation and variance. The fact 
that the average deviation is not as efficient as the range in small samples may be what is 
meant. 

The terms efficient and accurate are also interchanged in the discussion of the Range. (The 
ideas of efficiency and bias as applied to sample statistics are again confused in the section on 
Use of Tables I and II) The use of Table V to estimate a population standard deviation from 
an average range is in error. The divisor used, 2.48, is that required to get an unbiased esti- 
mate of the population variance. The correct divisor to use here is 2.33. This latter will not be 
found easily since the last two rows in Table V are mislabelled. 

Chapter 3, on Experimental Design and the Analysis of Variance, is clearly written and 
makes several important distinctions between types of experimental situations. Although 
subscript notation is introduced without explanation, its use can be figured out from the 
context. 

The chapter on Comparison of Averages, has errors in formulas and in exposition. The 
two numbers of degrees of freedom defined on page 39 are transposed. The formula for ¢ should 
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have the difference of the population means subtracted from its numerator and should have 
the standard error of the numerator quantity in the denominator. The population mean that 
was designated by two other symbols in Chapter 1 is now wrongly called X. 

A full list of the regrettable errors that mar later chapters is not judged worth the reader's 
time. It should be clear from what has been that this book cannot be recommended to chemists 
who wish to use modern methods in handling their data. 

Cuthbert Daniel 


a 


Editor’s Note 


On August 21, 1960 the Section on Physical and Engineering Sciences, Ameri- 
can Statistical Association and the Institute of Mathematical Statistics presented 
the program ‘“‘Spectral Analysis of Time Series’ as part of a joint meeting of 
the two societies held at Stanford University. The program on spectral analysis 
was organized by J. Cameron, E. Parzen and L. LeCam, and consisted of papers 
presented by G. M. Jenkins and E. Parzen with discussion by J. W. Tukey 
and N. R. Goodman. The papers, “‘General Considerations in the Analysis of 
Spectra” by Dr. Jenkins and ‘‘ Mathematical Considerations in the Estimation 
of Spectra” by Dr. Parzen, plus two additional papers incorporating the major 
points of the discussion: “Discussion, Emphasizing the Connection Between 
Analysis of Variance and Spectrum Analysis” by Dr. Tukey and ‘Some Com- 
ments on Spectral Analysis of Time Series’ by Dr. Goodman, comprise the 
first four papers in this issue of Technometrics. The replies of Messrs. Jenkins 
and Parzen are also published. To augment these papers and discussion, two 
additional papers: “Spectral Analysis Combining a Bartlett Window with an 
Associated Inner Window” by T. H. Wonnacott and “Frequency Response 
for Stationary Noise; Two Case Histories” by N. R. Goodman, S. Katz, B. H. 
Kramer and M. T. Kuo also appear. 

It is hoped that occasional future issues of Technometrics can provide similar 
groups of papers on special topics. To this end, the next issue of Technometrics 
will contain several articles on the general subject of factorial designs. 

To make room for the papers on Spectral Analysis the following articles, 
originally scheduled for this issue of Technometrics, will appear in the next 
issue: 

“The Optimum Allocation of Spare Components in Systems” by Donald I’. 

Morrison 
“Use of Tables of Percentage Points of Range and Studentized Range’ 
by H. Leon Harter 
The Editor is grateful to these authors for their willingness to postpone 
publication. 
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Readers are requested to send complete 
writeups of computer programs of interest to 
statisticians to Dr. Fred C. Leone, Director, 
Statistical Laboratory, Case Institute of 
Technology, 10900 Euclid Avenue, Cleveland 
6, Ohio, for consideration for publication in 
Technometrics. Each computer program 
announcement is to contain: 

1. A brief problem description, 2. The 
type of computer for which the program is 
applicable, 3. The author’s name and address, 
4. A brief description of the program in 
cluding: Limitations, auxiliary equipment 
requirements, statements on accuracy, avail- 
ability of sample problems, running time 
estimates and programs storage requirements. 
All programs submitted for publication must 
be available for distribution subject only 
to nominal costs for the reproduction and 
mailing of program tapes, cards, etc. Inquiries 
about published programs should be ad- 
dressed to the authors. Comments and cri- 
tiques of the published programs may be sent 
to Dr. Leone and will be published when 
considered appropriate. 


General Linear Hypothesis, BIMED No. 14, 
June 1960, IBM 709 


This program analyzes the statistical 
significance of independent variables for 
those experimental designs that can be 
formulated in terms of the General Linear 
Hypothesis model. The variables that may 
be analyzed are of two general types: Classi- 
fication or analysis of variance variables, 
and regression variables or covariates. 

The model may include p analysis of 
variance variables and q covariates subject 
to the restrictions: p < 60; q < 60 and 
l= p+4q < 60. The program can analyze 
unbalanced analysis of variance and co- 
variance designs according to stated hypoth- 
eses. As many as 9 hypotheses may be stated. 
The program can perform a transformation 
of the input data by means of a designation 
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in the Problem Card. This program was 
successfully run. A sample problem is avail- 
able, but no flow diagram is available. 
Prof. W. J. Dixon, Div. Of Biostatistics, 
Dept. of Preventive Medicine & Public 
Health, School of Medicine, University of 
California at Los Angeles, Los Angeles 24, 
California. 


Analysis of Variance and Auziliary Programs, 
April 1960, IBM 650, Fortransit 


The following routines can be found in 


this report: 1) Between-Subject Designs 
(the Simple Randomized ANOVA, the 


A X BFactorial ANOVA and theA X BXC 
Factorial ANOVA), 2) Within-Subject De- 
signs: One within-subject dimension (the 
A X S ANOVA, the type I ANOVA and 
the type III ANOVA), 3) Within-Subject 
Designs: Two within-subject dimensions 
(the A X B X S ANOVA and the type VI 
ANOVA), 4) Auxiliary Programs (the table 
inversion, the column blocking and the 
reciprocal transformation programs) and 
5) Auxiliary Statistical Programs (the dis- 
tribution analysis of the A X S interaction 
and Bartlett test of homogeneity). 

The simple randomized ANOVA performs 
a simple analysis of variance on a set of 
fixed point data consisting of up to 25 groups 
with up to 500 scores per group. The A XB 
factorial ANOVA permits up to 100 scores 
per group and up to 25 categories of the 
A and B dimensions. The A X B X C 
factorial ANOVA permits up to 100 scores 
per group, up to 10 rows and 10 columns, and 
up to 20 slices. The A X S ANOVA permits 
up to 200 levels of the A-dimension with no 
specific limitation on the number of sub- 
jects. The type I ANOVA will analyze A 
scores for each of the n, subjects in each of 
B groups. The A-dimension is a within- 
subjects dimension and the B-dimension is 
between subjects. The present program 
permits up to 100 levels of the A- and B- 
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dimensions. There is no specific restriction 
on n, . The type III program will analyze A 
scores for each of n,, subjects in each of BC 
groups. The A-dimension is a within-subjects 
dimension and the B- and C-dimensions are 
between-subjects. The present program per- 
mits up to 25 levels of A-, 10 levels of B-, 
and 20 levels of C-dimension. There is no 
specific restriction on the limit of n,. . In 
A X B X S ANOVA each of n subjects has 
a score for each of AB treatment combina- 
tions, where A is the number of levels of the 
A-dimension and B is the number of levels 
of the B-dimension. In the present program, 
up to 25 levels of the A-dimension and up 
to 10 levels of the B-dimension may be used. 
There is no specific limitation on N. The 
type VI design involves C independent 
“replications” of the A X B X S design. 
Each of n. subjects at a given level of the 
C-dimension has a score on each of AB 
combinations of the A- and B-dimensions. 
The present program permits a maximum of 
25 levels of the A-dimension, 10 levels of 
the B-dimension, 20 levels of the C-dimension, 
with no limitations on n- . 

The table inversion program is useful in 
converting data arranged for within-subject 
designs. It will permit up to 10 columns and 
up to 100 rows. The column blocking program 
will organize the data into blocks. It permits 
up to 500 columns with no limit on the 
number of subjects. This program was 
successfully run. Neither a sample problem 
nor a flow diagram is available. John P. 
Dolch, Director, Computer Center, State 
University of Iowa, Iowa City, Iowa. 


An Analysis of Variance for Trend, IU 124, 
IBM 650 


This program deals with Single and 
Multiple Group Analysis. Three methods for 
Single Group Analysis are discussed. Method 
A assumes that the group regression curve is a 
curve through the trial means. Individuals 
are assumed to have regression curves 
parallel to the group regression curve. Three 
sources of variation are considered: (1) inter- 
action (the error term), (2) between trials 
(used to test for trend), (3) between individual 
means. Method B assumes the least squares 
straight line fit to the points for an individual 
is used as an estimate of his regression curve. 
Four sources of variation are considered: 
(1) between individual means, (2) between 
individual slopes, (3) group slope (to test 
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for overall slope in tiie gtuup), (4) individual 
deviation from linearity ‘error term). 

Method C combines methods A and B. 
The individual subjects may have different 
slope regression curves and the group slope 
may be curvilinear. The individual regression 
curve is a straight line fit adjusted by the 
size of the deviations of the group means 
from the group least squares straight line 
regression curve. There are five sources of 
variation considered: (1) between individual 
means, (2) between individual slopes, (3) 
group slope (to test whether there is an 
overall linear ternd ), (4) individual deviations 
from estimation (error term), (5) group 
deviation from linearity (to test whether 
the group regression line deviates significantly 
from linearity). Finally, there is the total 
error, which is the same in all methods. The 
multiple group analysis is essentially the 
same as that in Method C. However, there 
are some added sources of variation because 
of the between-group consideration. The 
program is limited to 100 trials and 20 groups. 
Time is approximated by 


Time (seconds): 4.9K + 5.4NT 
+ 1.2M — 62.7 


where K = number of trials, NT = total 
number of subjects and M = number of 
groups. 

This program was successfully run. No 
sample problem nor flow diagram is avail- 
able in the writeup. Director, Research 
Computing Center, Indiana University, 
Bloomington, Indiana. 


The Balanced Lattice Square, October 1960, 
IBM 650 


Included are the analysis of variance and 
adjustment of treatment means for the 
experimental design known as the balanced 
lattice square. In this design, there are /* 
treatments or variates tested in k + 1 
arrangements in which any pair of treatments 
occurs together once in a row and once in a 
column. k may be equal or less than 9. 
Calculations are done in floating point. Time 
varies according to the size of the lattice 
square and according to the way in which 
the data are introduced into the computer, 
by cards or by paper tape. All the data are 
read into the machine prior to any output. 
Output on the 407 ranges from 1 to 3 minutes. 
This program was successfully run. Neither 
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a sample problem nor a flow diagram is 
available. Director, Computing center, Cor- 
nell University, Ithaca, New York. 


Case-1033, September 1960, Burroughs 220 
(5k) 


CASE is a general purpose statistical 
routine. It is an extension of the IBM 650 
Statistical Interpretive System (SIS) for use 
on the Burroughs 220 computer. CASE is 
modelled after the Bell System developed 
by Dr. V. M. Wolontis at Bell Laboratories 
(see IBM Technical Newsletter No. 11). It 
transforms a 5k Burroughs 220 into a three 
address floating decimal computer with 1000 
storage locations (excluding the actual 
interpretive system, special functions and so 
forth). This program is primarily suited for 
statistical and scientific calculations. Matrix 
operations (multiplication, inversion, etc.) 
are also available. Transformations on raw 
input data may be carried out just before 
the calculation of moments and cross products 
by a statistical read command. Up to 30 
variables and 10,000 observations can be 
processed in this manner. The report assumes 
no previous familiarity with stored program 
cumputers, but also describes the internal 
structure of the system (including execution 
times) enabling possible modifications by the 
user. This program was successfully run. 
A sample problem and flow diagram are 
available. Director, Statistical Laboratory, 
Case Institute of Technology, Cleveland 6, 
Ohio. 


Contingency Table Programs for Statistical 
Analysis, January 1959, IBM 650 


The use of this program sequence permits 
rapid and efficient construction of contingency 
tables from data stored on punch cards. The 
number of tables which may be constructed, 
their size, the range of the data and the 
manner in which the data are represented 
(data card format) are variables which are 
specified by the users. The program generates 
all necessary input, output and control 
instructions required to meet the require- 
ments of a specific application. 

The limations are as follows: 


(A) The largest contingency table that can 
be constructed is a 20 X 20 table, i-e., 
400 cells. 

(B) The number of contingency tables 
that can be constructed in one pass 
ranges from six (20 X 20) tables to 
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thirty tables as long as the sum of the 
number of cells in all tables does not 
exceed 2400. 

(C) No more than twenty range vectors 
may be used in one pass. 

(D) The field for the data variables must 
not exceed five-card columns. 

(E) There is practically no limit to the 
number of data cards that can be 
processed in one pass. However, the 
number of contingencies (counts) in 
any one cell must be less than 100,000. 


This arbitrary assignment can be changed as 
long as the following condition is met: 


. + 5N < 1350 


where N is the number of contingency 
tables to be constructed (in one pass) and 
M is the total number of cells in all con- 
tingency tables. The change of arrangement 
can be accomplished by changing at most 
three cards in the SOAP II version of the 
programs and reassembling. This program 
was successfully run and a sample problem 
is available. Director, Computing Center, 
University of Houston, Houston 4, Texas. 


Chi Square Program, 6.0.502, IBM 650 


The objective of this program is to compute 
chi-square for contingency tables (matrices) 
up to nine by nine. It will accept data in the 
form of punched cards; each card representing 
one sample and containing the m categories 
of v variables. (All variables need not have 
the same number of ‘‘dependent’’ variables. ) 
The machine will compute chi-square for 
each “dependent” variable paired with each 
variable following that ‘‘dependent”’ variable. 

Using the notation: 


v = number of variables to be compared 

k number of ‘‘dependent”’ variables 

m number of categories of variables 

M = number of categories to be compared 
(or if all categories are to be used, 
M is maximum number of cate- 
gories of any variable), ie., M 
largest m. 

N = sample size. 


The restrictions on these quantities are 
as follows: 
1< dy M < 9 
R< Ese 
N < 1,000,000 
M?K(2v — k — 1) < 2736. 
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This program was successfully run and a 
flow diagram is available. Director, Statistical 
and Computing Laboratory, University of 
Michigan, Ann Arbor, Michigan. 


Data Conversion Program, No. 6.0.504, IBM 
650 


This program will convert numbers from 
a range +A, to +Az2 to a range +B, to 
+Be , where A; and A? are only 10 digit 
numbers with the decimal points in the 
same position on both, Az > A; and B, and 
B. are any non-negative 10 digit numbers 
with the decimal points in the same position 
on both, B: > B,. B, and Bz are specified by 
control cards and A, and A2 are determined 
from the raw data. At the option of the 
user the converted data may be punched 
out with 1, 2, 3 or 10 numbers to a 10-digit 
word. The converted numbers are in a form 
permitting their immediate use with other 
programs of 6.1.500. Each set of numbers is 
referred to as a ‘‘subject’’ and the numbers 
comprising the set are referred to as ‘‘vari- 
ables’. In converting data the range from 
which numbers are to be converted is found 
by comparing the ith variable of each subject 
to determine a maximum and minimum. The 
maximum number of variables per subject 
is 100, the number of subjects is not limited. 
If more than 100 variables per subject are 
required, it will be necessary to ‘‘re-soap”’ 
the program. Fixed point arithmetic is used. 
The estimated time is 
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[: + .3(n + m)| seconds, 


where n = number of variables per subject 
and m = total number of variables. This 
program was successfully run and a sample 
problem is available. Dr. Thomas A. Keenan, 
Director, Computing Center, University of 
Rochester, Rochester 20, N. Y. 


Optimization Subroutine on ‘Direct Search’’ 
Technique, Research Report 6-41210-4-R2, 
June 1960, IBM 704 FORTRAN II 


This subroutine provides a powerful means 
for optimizing a wide range of problems. 
Up to twenty independent variables may be 
used and may be subjected to several dif- 
ferent types of restrictions. To aid in the 
understanding of the operation of subroutine, 
three flow diagrams of different levels of 
complexity—the Simplified Flow Chart, the 
Generalized. Flow Chart, and the Detailed 
Flow Chart—are included. Both the Sim- 
plified and the Generalized Flow Charts are 
intended to show the operation of the sub- 
routine in conjunction with the calling 
program. The Detailed Flow Chart applies 
to the subroutine only, and is written in 
FORTRAN language. The reader is referred 
to Scientific Paper No. 10-1210-1-P1 (West- 
inghouse Research Labs.) for a discussion of 
the reasoning which leads to the optimization 
method used. This program was successfully 
run. A sample problem and flow diagram 
are available. Westinghouse Research Labo- 
ratories, Pittsburgh, Pa. 
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NOTICES 


GORDON RESEARCH CONFERENCE 
STATISTICS IN CHEMISTRY AND CHEMICAL ENGINEERING 
August 7-11, 1961, New Hampton, New Hampshire 

Cart A. Bennett, Chairman 
J. Stuart Hunter, Vice Chairman 


PROGRAM 
Monday, 7 August. 
Morning: J. S. Hunter, Cuthbert Daniel, “Screening Experiments for 
Factors.”’ 
Evening: C.W. Dunnett, “Screening Experiments for Treatments.” 


Tuesday, 8 August. 
Morning: J. W. Gorman, P. W. M. John, “Experiments with Mixtures.” 
Evening: R. W. Sommers, ‘‘Theoretical Models for Physical Problems.” 


Wednesday, 9 August. 
Morning: Rutherford Aris, ‘Chemical Engineering Applications of Spectral 
Analysis.” 


Evening: Gwilym M. Jenkins, “Spectral Approach to Industrial Experi- 
mentation.”’ 





Thursday, 10 August. 
Morning: A. J. Duncan, “Bulk Sampling.” 
Evening: To be announced. 


Friday, 11 August. 
Morning: Marvin Zelen, ‘‘ Robustness of Life Testing Procedures.” 


The primary purpose of the Gordon Research Conferences is to provide an 
opportunity for free and unhurried discussion of major problems in areas of 
science where current research is uncovering new frontiers. The attendance at 
each conference is limited in number to encourage informal discussion and to 
make it easier for conferees to become acquainted. Those attending will be 
restricted in number.to approximately 100. The conference continues for five 
days, Monday through Friday noon. For applications to attend the Gordon 
Research Conference on Statistics in Chemistry and Chemical Engineering 
write: Dr. George W. Parks, University of Rhode Island, Kingston, Rhode Island. 


FORD FOUNDATION FELLOWSHIP PROGRAM 
NortuH CAROLINA STATE COLLEGE 


The Science and Engineering Program of the Ford Foundation, convinced 
that one of the most serious problems immediately facing American engineering 
education is the recruitment and retention of highly qualified faculty personnel, 
has made a grant to North Carolina State College to encourage faculty re- 
cruitment via doctoral programs in engineering and science. Full time doctoral 
programs are available at North Carolina State College in the following fields: 
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School of Engineering: Ceramic Engineering Electrical Engineering 
Chemical Engineering Mechanical Engineering 
Civil Engineering Nuclear Engineering 
School of Physical and Applied Mathematics 
Applied Mathematics 
Physics 
Experimental Statistics 


Terms of Fello\.ship: The fellowship shall be for a period of twelve months, 
and may be renewed for succeeding periods of one year. The fellow may also 
be employed, but only as a teacher or research assistant, and may hold another 
fellowship. The fellowship grant is to be considered as one component of the 
total aid given to each applicant, adjusted so that the gross income for any 
combination of fellowships and permissible employment will not exceed $3,000 
plus tuition and fees per twelve month year. Eligibility Requirements: Admission 
to graduate study with the avowed purpose of earning a doctorate in engineering 
or science, serious interest in an acadamic career in engineering or science, 
likelihood of completion of doctorate within three years, and a bona fide need for 
additional financial support. How to apply: Send to the Dean of Engineering, 
North Carolina State College, Raleigh, North Carolina, three copies of an 
application for admission to the Graduate School of North Carolina State 
College and one copy of an application for Ford Foundation fellowship-loan 
funds. Further information and application forms may be obtained from the 
Dean of Engineering at North Carolina State College. 


CORNELL UNIVERSITY INDUSTRIAL ENGINEERING SEMINARS 


Sponsor Department of Industrial and Engineering Administration, Sibley 
School of Mechanical Engineering, Cornell University 
Place Cornell University, Ithaca, New York 
Time June 13 through June 16, 1961 
Seminar Participants enroll in one of the following nine groups and also 
attend general sessions. 
Groups A. Industrial Management 
B. Engineering Administration 
C. Capital Investment Planning: Theory and Practice 
D. Operations Management of the Smaller Company 
FE. Work Measurement 
F. Systems Simulation Using Digital Computers 
G. Queuing and Inventory Theory 
H. Statistical Decision-Making: Theory and Applications 
I. Statistical Reliability Analysis: Theory and Applications 


Speakers Specialists from both industry and the staff of Cornell University 


For Whom Operating management personnel in line supervision and staff 
positions in industrial engineering, production engineering, engi- 
neering administration, operations research, research and develop- 
ment, quality control production control, cost control, materials 





NOTICES 307 


management, purchasing, data processing, and the controller’s 
function. 


For additional information write: 

J. W. Gavett, Seminars Coordinator 

Department of Industrial and Engineering Administration 
Upson Hall 

Cornell University 

Ithaca, New York 





PURDUE UNIVERSITY COURSES IN STATISTICAL METHODS 
AND ADVANCED QUALITY CONTROL 


There will be three intensive courses in the general areas of statistical methods 
and quality control, design of experiments, and operations research at Purdue 
University this summer. 

The course in Statistical Methods and Advanced Quality Control has been 
given at Purdue annually since 1947, being most recently revised in 1960. This 
course is designed for those who have had the equivalent of one of the intensive 
courses in statistical quality control given during and after the war, and who 
want to learn more about the statistical approach to industrial and research 
problems. 

Topics to be studied during the ten-day quality control course are Significance 
Tests and Confidence Intervals, Significance of Differences, Linear Correlation 
and Regression, Single Sampling for Measurements, Sequential Sampling for 
Measurements, Multiple Correlation, and Analysis of Variance. Instructors 
include Profs. irving W. Burr, the course director, and Charles R. Hicks, both 
of Purdue, Cecil C. Craig of the University of Michigan and Gayle McElrath 
of the University of Minnesota. The dates for this course are September 5-15. 

The second course is on Design of Experiments and is for statisticians, quality 
control personnel, engineers, and others concerned with planning, analyzing, 
and interpreting the results of industrial experiments. The dates for this course 
are June 7-17. This will be the third year this course has been offered. 

Professor Hicks, of the mathematical and statistical staff, the course director, 
emphasizes that this advanced course is for persons who have had previous 
statistical training, including work on tests of hypotheses, linear correlation, 
and at least an introduction to analysis of variance. 

Topics included in the short course are Review of Analysis of Variance, 
Principles of Experimental Design, Variance Component Analysis, Randomized 
Blocks and Latin Squares, Factorial Experiments, Split Plot Designs, Con- 
founding in Factorial Experiments, Incomplete Block Designs, Fractional 
Replications, and Introduction to Evolutionary Operation (EVOP). 

In addition to Professor Hicks, instructors will include Prof. Clyde Y. Kramer, 
Virginia Polytechnic Institute, and Professor McElrath. Enrollment will be 
limited to a maximum of 25-30. 

The third ten-day short course is on The Mathematical Techniques of Opera- 
tions Research which will be offered at Purdue University for the second time 
this summer on June 5-15. 
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The course has been designed for statisticians, quality control analysis, engi- 
neers, and other technical personnel in industrial and management positions. 
Emphasis will be placed on the mathematical techniques of operations research 
and the application of these methods to current industrial and military problems. 

These methods involve the construction of mathematical models representing 
the operation of industrial management or a military organization, and suggest 
the best solutions to problems involved. Among the topics to be discussed during 
the course are inventory control models, waiting line models, linear programming, 
simplex method, transportation methods, production scheduling models, search 
theory, cost-effectiveness studies, and systems analysis. 

Instructors for the short course are Prof. Paul Randolph of Purdue, who is 
the short course director, Albert Madansky, of the RAND Corporation, and 
Prof. Bernard Lindgren, of the University of Minnesota. Enrollment in this 
course will be limited. 

All three courses are sponsored jointly by the Statistical Laboratory and the 
Division of Adult Education at Purdue. Further information about any of the 
courses may be obtained by writing to the Division of Adult Education, Purdue 
University, Lafayette, Indiana. 


NEW YORK UNIVERSITY LIFE TESTING 
AND RELIABILITY PROGRAM 


New York University, Division of General Education, Bureau of Conferences 

and Institutes announces a combined program: 
I. Tue Statistic or Lire Testine—June 12-23, 1961, and 
II. Specrat Topics In RELIABILITY THEORY—June 26-30, 1961, given as units 
of a three-week course from June 12 to 30, 1961, at New York University. 

I. The Statistics of Life Testing—June 12-23, 1961 
TOPICS INCLUDE: 

1. Stochastic models for failure 

2. Design and analysis of life tests assuming the exponential distribution 

. Analysis of life-test data for special forms of the distribution of failures 
different from the exponential 

4. Analysis of life-test data when the form of the distribution is unknown 

5. Inference from ‘“‘accelerated life tests” 

6. Inference from ‘‘incomplete data’”’ 

7. Multifactor experimental designs and life tests. 
Applicants should have a working knowledge of probability and statistics 
equivalent to the material covered in 8. S. Wilks’ ELEMENTs or Sratistics and 
an advanced undergraduate course in statistics or probability. 

Tuition for this unit: $260.00 (for the combined program, Units I and IT, $350.00) 
II. Special Topics in Reliability Theory—June 26-30, 1961 
Topics INCLUDE: 

1. Equipment maintenance policies 

2. Inventory policies 

3. Equipment availability 

4. Certain aspects of redundancy in complex equipment 
Applicants should have a background equivalent to the New York University 
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courses on THE Statistics or Lire TestiNG (previously given in August 1959 
and March 1960) and a knowledge of Markov processes equivalent to a course 
from the text, InrropucTION TO PROBABILITY AND ITs APPLICATIONS by W. 
Feller. 

Tuition for this unit : $150.00 (for the combined program, Units I and II, $350.00) 

Faculty: Dr. BenJAMIN EpsTEtn, the principal instructor, is one of the leading 

developers of the statistical theory concerned with the design and 
analysis of life tests. Dr. Epstein, consultant to business and industry, 
was formerly Professor of Mathematics and Statistics at Wayne 
State University and Stanford University. 

Mr. Wrtu1AM ALLEN is a Research Associate and Assistant Director 
of the Statistical Techniques Research Group at Princeton Uni- 
versity. 

Detailed information and application forms will be sent in February. However, 
you may wish at this time to reserve the period, June 12-30, 1961, on your 
calendar. 

Inquires should be addressed to: 


RayMonpD N. WIi.bBurn, Executive Director 
Bureau of Conferences and Institutes 
NEW YORK UNIVERSITY 

6 Washington Square North 

New York 3, N.Y. 


STATISTICIAN 
EXPERIMENTAL DESIGN 


Major food processing company has above average 
opportunity available 


. in mathematical research-experimental design program 


Responsibilities will include 


e designing and analyzing experiments in research and 
development area 


e developing new techniques 
e training personnel in the use of mathematical techniques 


Prefer a man with Master’s degree in mathematics or statistics with 2 
to 3 years related experience + Above average starting salary and ad- 
vancement opportunity + Midwest location 


Send complete resume in confidence to: 


TECHNOMETRICS 


Box 587, Benjamin Franklin Station 
Washington 4, D. C. 





BIOMETRICS 
Journal of the Biometric Society 
Vol. 17, No.1 CONTENTS March 1961 


Marginal Percentages in Multiway Tables of Quantal Data With 
Disproportionate Frequencies F. Yates 
Some Classification Problems with Multivariate Qualitative 
Data William G. Cochran and 
Carl E. Hopkins 
The Standardization of Tuberculin-Hypersensitivity M. Stone and 


R. A. Bruce 
Optimum Estimation of Gradient Direction in Steepest Ascent 


Experiments Samuel H. Brooks and 


M. Ray Mickey 
A Stochastic Study of the Life Table and Its Applications ITI. 


The Follow-Up Study with the Consideration of Competing 
Risks Chin Long Chiang 
The Spearman Estimator for Serial Dilution Assays Eugene A. Johnson and 
Byron Wm. Brown, Jr. 
The Fitting of a Generalization of the Logistic Curve J. A. Nelder 
Combined Analysis of Balanced Incomplete Block Designs with 
Some Common Treatments M. V. Pavate 
Generalized Asymptotic Regression and Non-Linear Path 
Analysis Malcolm E. Turner, 
Robert J. Monroe, and 
Henry L. Lucas 
On the Analysis of Split-Plot Experiments H. Leon Harter 
Queries and Notes 
Confidence Intervals for Recombination Experiments with Micro- 
organisms A. W. Kimball 
Estimation of Proportion from Zero-Truncated Binomial 


Data G. N. Wilkinson 
Book Reviews 


E. J. Gumbel: Statistics of Extremes B. F. Kimball 
A. Vessereau: Methodes Statistiques en Biologie et en Agronomie 
P. Robinson 
A. G. Love, E. L. Hamilton and I. L. Hellman; Tabulating Equip- 
ment and Army Medical Statistics W. W. Holland 


Biometrics is published quarterly. Its objects are to describe and exemplify the use of 
mathematical and statistical methods in biological and related sciences, in a form assimilable 
by experimenters. The annual non-member subscription rate is $7. Inquiries, orders for back 
issues and non-member subscriptions should be addressed to: 


BIOMETRICS 

Department of Statistics 
The Florida State University 
Tallahassee, Florida 





INDUSTRIAL QUALITY CONTROL 
e Vor. XVII, No. 7, January 1961 


S Q C vs. Intuitive Inspection Chester T. Raymo 
Critique of Vendor Quality Rating Systems Leroy Folks 
Quality Control in the Construction Industry Joseph J. Waddell 


A Decentralized Team Approach to Quality and Reliability 
T. E. Smith and A.W. Wortham 


e Vor. XVII, No. 8, Fesruary 1961 


Are Statistical Life Testing Procedures Robust ? 
National Bureau of Standards 


Data Fun—Data Frustration—Data Failure C. Gadzinski 


Automatic Production Recording—Yield and Control. .James A. Curry 


Applications of Industrial Statistics in Research and Development 
Charles A. Bicking 


Intercomparison of Laboratory Test Methods 
National Bureau of Standards 


e VoL. XVII, No. 9, Marcu 1961 


Quality Control Programs Should Be Cost Reduction 
Programs Harmon S. Bayer 


Mechanization, Quality Control, and Automation 
A.W. Wortham, Kenneth W. Davidson and Norman S. Ince 


Distribution of the Range and Mid-range When Sampling froma 
Negatively Skewed, Right Triangle Population 
B. Ostle, R. R. Prairie and J. M. Wiesen 


Examples of Designed Experiments H. C. Hamaker 


Lot Acceptance Sampling by Variables Using the Median and 
Quasi-Range Albert Beck, Jr. 


Industrial Quality Control is published monthly by the American Society for 
Quality Control, Inc. All correspondence concerning membership in the society and 
subscriptions to the journal should be addressed to American Society for Quality 
Control, Inc., Rm 6185 Plankinton Bldg., 161 West Wisconsin Ave., Milwaukee 3, 
Wisconsin. 





BIOMETRIKA 


Volume 47, Parts 3 and 4 DEcEMBER 1960 
CONTENTS 
Memoirs: 


Lesuiz, P. H. anp Gower, J. C. The properties of a stochastic model for the predator- 
prey type of interaction between two species. 

Buiom, Gunnak. Hierarchical birth and death processes I. Theory. 

Buiom, Gunnar. Hierarchical birth and death processes II. Applications. 

Gunn, W. A. A comparison of the effectiveness of tournaments. 

Pearce, S. C. Supplemented balance. 

Fo.ks, JOHN LEROY AND KEMPTHORNE, Oscar. The efficiency of blocking in incomplete 
block designs. 

Haicut, Franx A. Queueing with balking, II. 

Davin, H. A. AND PEREz, CARMEN A. On comparing different tests of the same hy- 
pothesis. 

Farurz, D. J. G. The performance of some correlation coefficients for a general bi- 
variate distribution. 

McFappeEn, J. A. Two expansions for the quadrivariate normal integral. 

Lana, R. G. anp Luxacs, E. On a problem connected with quadratic regression. 

McCuttoucs, Rocer §., GuRLAND, JOHN AND ROSENBERG, Lioyp. Small sample 
behaviour of certain tests of the hypothesis under variance heterogeneity. 

SuKHATME, BALKRIsSHNA V. Power of some two-sample non-parametric tests. 

Ewan, W. D. anp Kemp, K. W. Sampling inspection of continuous processes with no 
autocorrelation between successive results. 

Buiytu, Couin R. anp Hutcuinson, Davin W. Table of Neyman-shortest unbiased 
confidence intervals for the binomial parameter. 

BENNETT, B. M. anv Hs, P. On the power function of the exact tests for the 2 X 2 
contingency table. 

Srmupson, J. A. anp Wetcu, B. L. Table of the bounds of the probability integral when 
the first four moments are given. 

Severo, Norman C. anp ZELAN, Marvin. Normal approximation to the chi-square 
and non-central F probability functions. 

Barton, D. E., Davin, F. N. anp O’NeEttu, ANNE F. Some properties of the distribu- 
tions of the logarithm of non-central F. 

Linb.ey, D. V., East, D. A. AND Hamixton, P. A. Tables for making inferences about 
the variance of a normal distribution. 

Barton, D. E., Davip, F. N. anp Merrincton, M. Tables for the solution of the 
exponential equation, exp (—a) + ka = 1. 

Miscellanea: 
Contributions by G. E. Bardwell, B. R. Bhat and J. Gani, D. R. Cox, R. N. Curnow, 
N. Gilbert, W. M. Harper and J. A. Macdonald, N. L. Johnson and D. H. Young, 
M. G. Kendall, A. N. Kshirsagar, H. Linhart, T. A. Ramsubban. 


Reviews: Other Books Received: 
The subscription, payable in advance, is 54/—(or $8.00) per volume (including 
postage). Cheques should be made payable to Biometrika, crossed “a/c Biometrika 
Trust’ and sent to The Secretary, Biometrika Office, University College, London, 
W.C.1. All foreign cheques must be drawn on a Bank having a London agency. 


Issued by THE BIOMETRIKA OFFICE, University College, London 





INTERNATIONAL 
JOURNAL OF ABSTRACTS 
STATISTICAL THEORY AND METHOD 


COVERAGE OF THE JOURNAL 


The aim of this journal is to give complete coverage of published papers in the field 
of statistical theory (including associated aspects of probability and other mathematical 
methods) and new published contributions to statistical method. 

There are approximately two hundred and fifty journals published in various parts 
of the world which are wholly or partly devoted to the field of statistical theory and 
method and which are brought within the scope of this journal of abstracts. 

In addition to the vast array of journal literature, abstracts of the special collections 
of papers as published in reports of conferences symposia and seminars, together with 
the published reports of experiment and other research stations, are aalso included. 


FORMAT OF THE JOURNAL 


The abstracts are about 400 words long—the recommendation of UNESCO for the 
“long” abstract service: they are in the English language although the language of the 
original paper is indicated on the abstract together with the name of the abstractor. In 
addition, the address of the author is given in sufficient detail to facilitate conttact in 
order to obtain further details or request an offprint. 

A scheme of classification has been developed for the abstracts that is flexible and 
facilitates the transfer of code numbers to punched cards. Each abstract has two classi- 
fication numbers: the primary number in heavy type to indicate the basic topic of the 
paper and the secondary number in brackets to take account of the most important 
cross-reference. A unique aspect of this journal is that the pages are colour-tinted accord- 
ing to the main sections of the classification. This method of colour-coding the pages 
provides a distinctive and powerful visual aid to the identification of abstracts in what- 
ever manner the journal is filed for reference. 


THIS JOURNAL is prepared by an international editorial organisation under a General 
Editor in association with a Managing Editor for the American and Pacific Area. The 
General Editor works in conjunction with the Research Techniques Unit (London 
School of Economics) and the Managing Editor is on the staff of the Institute of Sta- 
tistics (North Carolina State College). 


GENERAL EDITOR 


Dr. Wm. R. Buckland 
c/o 55 Broadway, London, S.W. 1, England 


MANAGING EDITOR (AMERICA AND PACIFIC) 


Prof. R. L. Anderson 
North Carolina State College, Raleigh, N.C., U.S.A. 


Annual subscription, £5 (U.S.A. & CANADA $16.00) 
Single number, 30s (U.S.A. & CANADA $4.50) 


PUBLISHED QUARTERLY FOR 
THE INTERNATIONAL STATISTICAL INSTITUTE BY 
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Georges Darmois, 1888-1960 
The Existence and Construction of Balanced Incomplete Block Designs 
Random Allocation Designs II: Approximate Theory for 

Simple Random Allocation A, P. Dempster 
Sampling Moments of Means from Finite Multivariate Populations D. W. Behnken 
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a Discrete Memoryless Channel 
An Exponential Bound on the Strong Law of Large Numbers for Linear 

Stochastic Processes with Absolutely Convergent Coefficients .............-: L. H. Koopmans 
Expected Utility for Queues Servicing Messages with 

Exponentially Decaying Utility Frank A. Haight 
On the Coding Theorem for the Noiseless Channel Patrick Billingsley 


Notes: 


The Essential Completeness of the Class of Generalized 
Sequential Probability Ratio Tests 


A Problem in Survival 

First Passage Time for a Particular Gaussian Process 

A Note on the Ergodic Theorem of Information Theory 
Remark Concerning Two-State Semi-Markov Processes 


An Example of an Ancillary Statistic and the Combination of 
Two Samples by Bayes’ Theorem 


Abstracts of Papers 
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Publications Received 


The purpose of the Institute of Mathematical Statistics is to encourage the development, 
dissemination and application of mathematical statistics. Membership dues, which include a 
subscription to the Annals of Mathematical Statistics are $10.00 per year for residents of the 
United States or Canada and $5.00 a year for residents of other countries. Inquiries regarding 
membership to the Institute should be sent to the Secretary: G. E. Nicholson Jr., Department of 
Statistics, University of North Carolina, Chapel Hill, North Carolina. 











PREPARATION OF MANUSCRIPTS 


Manuscripts should be submitted to the office of the editor: J. S. Hunter, 
Mathematics Research Center, U. S. Army; The University of Wisconsin; 
Madison, Wisconsin. Each manuscript should be typewritten, double spaced, 
with wide margins at sides, top, and bottom. The original should be submitted 
with two additional copies, on paper that will take corrections. Dittoed or 
mimeographed papers are acceptable only if completely legible. Footnotes 
should be avoided and replaced by remarks in the text, or placed in an appendix. 
Preferably, references in the manuscript should appear as (Jones, A. B., 1958), 
and again later in alphabetical order in a list of references. Alternatively, refer- 
ences may be numbered, e.g. [1], as they appear in the manuscript and be listed 
in this sequence in the list of references. In the reference list, each reference 
should contain, in the order indicated, the name and initials of the author 
followed by those of the co-authors, date of publication, title of reference, 
source, volume number and page. References to books should include pub- 
lisher’s name and location. 

Figures, charts, and diagrams should be professionally drawn on plain white 
paper or tracing cloth in black India ink twice the size they are to be printed. 
A full page diagram, in print, measures 7.25 X 4.75 inches. 

As far as possible, formulas should be typewritten and symbols not available 
on a typewriter carefully inserted in ink. Authors are asked to keep in mind the 
typographical difficulties of complicated mathematical formulae. The difference 
between capital and lower-case letters should be clearly shown; care should be 
taken to avoid confusion between such pairs as zero and the letter O, the numeral 
land the letter /, numeral 1 used as superscript and prime (’), alpha and a, kappa 
and k, mu and w, nu and », eta and n, etc. Subscripts or superscripts should be 
clearly below or above the line. Bars above groups of letters (e.g., log z) and 
underlined letters (e.g., z) are difficult to print and should be avoided. Symbols 
are automatically italicized by the printer and should not be underlined on 
manuscripts. Boldface letters may be indicated by underlining with a wavy line 
on the manuscript; boldface subscripts and superscripts are not available. 
Complicated exponentials should be represented with the symbol exp particu- 
larly when appearing in the text, that is, 


exp [(a” + 6*)'”] should be used in place of e“****”*”". 


In writing square roots the fractional exponent is preferable to the radical sign. 
Fractions in the body of the text (and when possible in displayed expressions) 
and fractions occurring in the numerators or denominators of fractions are 
preferably written with the solidus; thus 


a+b. 
(a + 6)/(e + d) rather than aad 
Authors will ordinarily receive only galley proofs. Fifty offprints without 
covers will be furnished free. Costs for additional reprints and covers can be 
furnished on request. 
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